1.5.1 Principal Component Analysis (PCA)
Principal
Component Analysis employs a linear transformation that is based on
preserving the most variance in the data using the least number of
dimensions. The original data set in a higher dimensional space is
mapped to a lower dimension space with maximum variance. The process
involves the construction of the covariance matrix of the original data
set. The eigenvectors of the covariance matrix of the data are referred
to as principal axes, and the projection of the data instances on to
these principal axes are called the principal components. Dimensionality
reduction is obtained by only retaining the axes (dimensions) that
account for most of the variance, and discarding all others.
[A] The percentage contributions for each principal
component derived from the analysis in explaining the variance in the
dataset are given as follows:
[A.1] PC1 =
37%
[A.2] Combined
PC1 and
PC2 = 65%
[A.3] Combined
PC1,
PC2 and
PC3 = 82%
[B] With eigenvalues sufficiently greater than 1, only
PC1 and
PC2 were used in the subsequent
exploratory analysis.
[B.1] PC1 =
2.22
[B.1] PC2 =
1.67
[C] PC1 can be
described as having a collectively high or low critic / audience scores,
oscar nominations and oscar winnability rate, as characterized by the
following descriptor variables, ranked based on contribution and
statistical significance:
[C.1] High
Tomatometer_Critic = 23.09%
[C.2] High
IMDB_Audience = 22.92%
[C.3] High
Tomatometer_Audience = 19.56%
[C.4] High
IMDB_Critic = 14.87%
[C.5] High
Nomination_SuccessRate = 13.73%
[C.6] High
Nominations_Total = 5.82%
[D] PC2 can be
described as having high critic scores but low audience scores and oscar
winnability rate, or having low critic scores but high audience scores
and oscar winnability rate, as characterized by the following descriptor
variables, ranked based on importance and statistical
significance:
[D.1] High
IMDB_Critic = 36.44%
[D.2] High
Tomatometer_Critic = 22.50%
[D.3] Low
IMDB_Audience = 16.37%
[D.4] Low
Tomatometer_Audience = 15.68%
[D.5] Low
Nomination_SuccessRate = 8.54%
[E] With both
PC1
and
PC2 considered,the following
relationships were observed among descriptor variables:
[E.1] Critic and audience scores are the most
represented descriptor variables for both principal components.
[E.1.1] IMDB_Critic
[E.1.2] Tomatometer_Critic
[E.1.3] IMDB_Audience
[E.1.4] Tomatometer_Audience
[E.2] Descriptor variables demonstrated
correlation and clustered around three groups:
[E.2.1] Cluster 1 (
IMDB_Critic,
Tomatometer_Critic) indicates that critic
scores are consistent for both sources (Tomatometer and IMDB).
[E.2.2] Cluster 2 (
IMDB_Audience,
Tomatometer_Audience,
Nomination_SuccessRate) indicates that
audience scores are consistent for both sources (Tomatometer and IMDB)
and are positively correlated with oscar winnability rate.
[E.2.3] Cluster 3 (
Nominations_Total) indicates that the
number of oscar nominations is not correlated with critic and audience
scores, as well as oscar winnability rate.
[F] With both
PC1
and
PC2 considered,the following
relationships were observed among individual instances:
[F.1] Movies which won the oscar best picture
award (e.g. Parasite, The King’s Speech, 12 Years a Slave, Spotlight,
The Artist, Argo, The Hurt Locker and Everything Everywhere All At Once)
generally had higher critic scores, audience scores, oscar nominations
and oscar winnability rate, than those movies which were not given such
an award. Other movie nominees which had the same characteristics
included Toy Story 3, The Grand Budapest Hotel, The Father, La La Land,
Gravity and Up.
[F.2] Other Movies which won the oscar best
picture award (e.g. Moonlight and The Shape of Water) had higher critic
scores but generally lower audience scores. Other movie nominees which
had the same characteristics included Roma, Boyhood, Drive My Car, The
Irishman, Minari and Manchester by the Sea.
[F.3] Other Movies which won the oscar best
picture award (e.g. Green Book and CODA) had higher audience scores but
generally lower critic scores. Other movie nominees which had the same
characteristics included Inception, Dune, Top Gun: Maverick, Django
Unchaied, Ford V Ferrari and Dallas Buyers Club.
[F.4] Movie nominees which had both generally
lower critic and audience scores including Extremely Loud &
Incredibly Close, Don’t Look Up, Vice and Triangle of Sadness highly
contributed the variances captured by
PC1. Movie nominees which had had
extremely lower critic scores but reasonably audience scores including
Bohemian Rhapsody, The Blind Side and Joker highly contributed the
variances captured by
PC2.
[F.5] There is no clear pattern in characteristics
when movies are grouped by the year or release.
##################################
# Loading dataset
##################################
SD_PCA <- Oscars.PCA
str(SD_PCA)
## 'data.frame': 126 obs. of 8 variables:
## $ Tomatometer_Critic : int 82 66 90 93 97 89 92 89 98 90 ...
## $ Tomatometer_Audience : int 82 85 82 80 84 88 74 68 90 79 ...
## $ IMDB_Critic : int 83 53 81 85 95 69 78 88 88 83 ...
## $ IMDB_Audience : int 79 76 79 73 75 83 73 70 83 74 ...
## $ Nominations_Total : int 9 2 4 3 9 8 6 2 5 6 ...
## $ Nomination_SuccessRate: num 0.333 0.5 0 0 0.667 ...
## $ Year : Factor w/ 14 levels "2010","2011",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Picture : Factor w/ 2 levels "NOM","WON": 1 1 1 1 2 1 1 1 1 1 ...
##################################
# Performing PCA
##################################
DR_PCA <- PCA(SD_PCA[,c(1:6)],
scale.unit = TRUE,
graph = FALSE)
##################################
# Obtaining the PCA eigenvalues
##################################
(DR_PCA_EV <- get_eigenvalue(DR_PCA))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 2.21987218 36.997870 36.99787
## Dim.2 1.66513612 27.752269 64.75014
## Dim.3 1.01373817 16.895636 81.64577
## Dim.4 0.67604616 11.267436 92.91321
## Dim.5 0.33051908 5.508651 98.42186
## Dim.6 0.09468828 1.578138 100.00000
##################################
# Formulating the Scree Plot
# for the variances
##################################
(DR_PCA_VarianceScreePlot <- fviz_eig(DR_PCA,
addlabels = TRUE,
ylim = c(0, 100),
choice = c("variance")) +
labs(title = "Principal Component Analysis : Scree Plot of Explained Variances",
subtitle = "Top 2 Principal Components",
y = "Percentage of Explained Variances",
x = "Principal Components") +
theme_classic())

##################################
# Formulating the Scree Plot
# for the eigenvalues
##################################
(DR_PCA_EigenvalueScreePlot <- fviz_eig(DR_PCA,
addlabels = TRUE,
ylim = c(0, 6),
choice = c("eigenvalue")) +
labs(title = "Principal Component Analysis : Scree Plot of Eigenvalues",
subtitle = "Top 2 Principal Components",
y = "Eigenvalues",
x = "Principal Components") +
theme_classic())

##################################
# Extracting the PCA
# descriptor variable loadings
##################################
DR_PCA_VAR <- get_pca_var(DR_PCA)
##################################
# Extracting the coordinates
# representing the loadings
# for the descriptor variables
##################################
DR_PCA_VAR$coord
## Dim.1 Dim.2 Dim.3 Dim.4
## Tomatometer_Critic 0.7160011 0.61208661 -0.24062432 -0.01481838
## Tomatometer_Audience 0.6589988 -0.51101999 -0.35527700 0.23993434
## IMDB_Critic 0.5746051 0.77898937 -0.03684305 -0.06857649
## IMDB_Audience 0.7133572 -0.52207446 -0.07046626 0.19067464
## Nominations_Total 0.3594166 0.08771218 0.85255162 0.35355795
## Nomination_SuccessRate 0.5520012 -0.37718042 0.31056271 -0.67245470
## Dim.5
## Tomatometer_Critic 0.10911558
## Tomatometer_Audience 0.33343405
## IMDB_Critic -0.12768134
## IMDB_Audience -0.41907508
## Nominations_Total 0.10547110
## Nomination_SuccessRate 0.06621183
##################################
# Extracting the correlations
# between the descriptor variables
# and principal component dimensions
##################################
DR_PCA_VAR$cor
## Dim.1 Dim.2 Dim.3 Dim.4
## Tomatometer_Critic 0.7160011 0.61208661 -0.24062432 -0.01481838
## Tomatometer_Audience 0.6589988 -0.51101999 -0.35527700 0.23993434
## IMDB_Critic 0.5746051 0.77898937 -0.03684305 -0.06857649
## IMDB_Audience 0.7133572 -0.52207446 -0.07046626 0.19067464
## Nominations_Total 0.3594166 0.08771218 0.85255162 0.35355795
## Nomination_SuccessRate 0.5520012 -0.37718042 0.31056271 -0.67245470
## Dim.5
## Tomatometer_Critic 0.10911558
## Tomatometer_Audience 0.33343405
## IMDB_Critic -0.12768134
## IMDB_Audience -0.41907508
## Nominations_Total 0.10547110
## Nomination_SuccessRate 0.06621183
##################################
# Extracting the quality of representation
# for the descriptor variables
# on the factor map
##################################
DR_PCA_VAR$cos2
## Dim.1 Dim.2 Dim.3 Dim.4
## Tomatometer_Critic 0.5126576 0.374650014 0.057900065 0.0002195845
## Tomatometer_Audience 0.4342794 0.261141435 0.126221744 0.0575684853
## IMDB_Critic 0.3301711 0.606824433 0.001357411 0.0047027343
## IMDB_Audience 0.5088785 0.272561741 0.004965494 0.0363568172
## Nominations_Total 0.1291803 0.007693427 0.726844261 0.1250032235
## Nomination_SuccessRate 0.3047053 0.142265072 0.096449197 0.4521953177
## Dim.5
## Tomatometer_Critic 0.011906210
## Tomatometer_Audience 0.111178264
## IMDB_Critic 0.016302526
## IMDB_Audience 0.175623921
## Nominations_Total 0.011124154
## Nomination_SuccessRate 0.004384007
##################################
# Extracting the contributions
# (in percentage) of the descriptor variables
# to the principal components
##################################
DR_PCA_VAR$contrib
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## Tomatometer_Critic 23.094013 22.4996629 5.7115404 0.03248069 3.602276
## Tomatometer_Audience 19.563262 15.6828881 12.4511188 8.51546661 33.637472
## IMDB_Critic 14.873427 36.4429325 0.1339015 0.69562326 4.932401
## IMDB_Audience 22.923776 16.3687363 0.4898202 5.37786015 53.135789
## Nominations_Total 5.819268 0.4620299 71.6994072 18.49033844 3.365662
## Nomination_SuccessRate 13.726255 8.5437503 9.5142118 66.88823084 1.326400
##################################
# Extracting the quality of representation
# for the descriptor variables
# on the factor map
##################################
(DR_PCA_VariableSquaredCorrelationCircle <- fviz_pca_var(DR_PCA,
col.var = "cos2",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
legend.title = "Squared Coordinates") +
labs(title = "Principal Component Analysis : Squared Correlation Circle",
subtitle = "Descriptor Variable Representation Quality",
y = "Principal Component 2",
x = "Principal Component 1") +
theme_classic() +
theme(legend.position="top"))

##################################
# Formulating clusters of the
# descriptor variables
##################################
set.seed(123)
DR_PCA_KMEANS <- kmeans(DR_PCA_VAR$coord, centers=3, nstart=25)
DR_PCA_KMEANS_CLUSTER <- as.factor(DR_PCA_KMEANS$cluster)
##################################
# Extracting the correlation
# between the descriptor variables
# and top principal components
##################################
(DR_PCA_VariableCorrelationCircle <- fviz_pca_var(DR_PCA,
col.var = DR_PCA_KMEANS_CLUSTER,
palette = c("#0073C2FF", "#EFC000FF", "#868686FF"),
legend.title = "Cluster") +
labs(title = "Principal Component Analysis : Correlation Circle",
subtitle = "Principal Components Versus Descriptor Variable Clusters",
y = "Principal Component 2",
x = "Principal Component 1") +
theme_classic() +
theme(legend.position="top"))

##################################
# Extracting the contribution
# of the descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_PCA_PCVariableContributors <- fviz_contrib(DR_PCA,
choice = "var",
axes = 1:2) +
labs(title = "Principal Component Analysis : Descriptor Variable Contribution",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Testing the statistical significance
# of the contribution
# of the descriptor variables
# for the the Top 1 and 2 principal components
##################################
DR_PCA_VariableContributionTest <- dimdesc(DR_PCA,
axes = c(1,2),
proba = 0.05)
##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 1 principal component
##################################
(DR_PCA_PC1VariableContributors <- fviz_contrib(DR_PCA,
choice = "var",
axes = 1) +
labs(title = "Principal Component Analysis : Descriptor Variable Contribution",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Testing the statistical significance
# of the contribution
# of the descriptor variables
# for the Top 1 principal component
##################################
DR_PCA_VariableContributionTest$Dim.1
##
## Link between the variable and the continuous variables (R-square)
## =================================================================================
## correlation p.value
## Tomatometer_Critic 0.7160011 4.384744e-21
## IMDB_Audience 0.7133572 7.103853e-21
## Tomatometer_Audience 0.6589988 4.925052e-17
## IMDB_Critic 0.5746051 1.986144e-12
## Nomination_SuccessRate 0.5520012 2.086140e-11
## Nominations_Total 0.3594166 3.577176e-05
##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 2 principal component
##################################
(DR_PCA_PC2VariableContributors <- fviz_contrib(DR_PCA,
choice = "var",
axes = 2) +
labs(title = "Principal Component Analysis : Descriptor Variable Contribution",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Testing the statistical significance
# of the contribution
# of the descriptor variables
# for the Top 2 principal component
##################################
DR_PCA_VariableContributionTest$Dim.2
##
## Link between the variable and the continuous variables (R-square)
## =================================================================================
## correlation p.value
## IMDB_Critic 0.7789894 6.683142e-27
## Tomatometer_Critic 0.6120866 2.639968e-14
## Nomination_SuccessRate -0.3771804 1.339456e-05
## Tomatometer_Audience -0.5110200 9.715208e-10
## IMDB_Audience -0.5220745 3.624252e-10
##################################
# Extracting the PCA individual scores
##################################
DR_PCA_IND <- get_pca_ind(DR_PCA)
##################################
# Extracting the coordinates
# representing the scores
# for the individuals
##################################
DR_PCA_IND$coord[,c("Dim.1","Dim.2")]
## Dim.1 Dim.2
## Avatar 0.267696540 -0.530442152
## The Blind Side -2.195453708 -3.301208059
## District 9 -0.430797744 0.112896306
## An Education -1.058977961 1.182366388
## The Hurt Locker 1.705555201 0.769878023
## Inglourious Basterds 0.487479202 -1.390603430
## Precious: Based on the Novel 'Push' by Sapphire -0.824912508 0.640551052
## A Serious Man -2.274858904 2.012978851
## Up 1.968871570 -0.415625889
## Up in the Air -0.948551664 0.937195852
## Black Swan -0.088494191 -0.715371246
## The Fighter 0.593337935 -0.511337506
## Inception 2.000141368 -2.400531058
## The Kids Are All Right -1.677629729 1.814279497
## The King's Speech 2.022535144 -0.140319446
## 127 Hours -0.376296841 0.640551980
## The Social Network 1.571785943 0.646156669
## Toy Story 3 2.112408569 -0.190945586
## True Grit 0.143820607 0.632433896
## Winter's Bone -1.197590702 1.942394429
## The Artist 1.829965512 0.030391009
## The Descendants -0.928054041 0.633597837
## Extremely Loud & Incredibly Close -6.607718620 -2.191543916
## The Help -0.780439361 -2.582418858
## Hugo 0.540836116 0.530503658
## Midnight in Paris -0.016386835 0.074821615
## Moneyball 0.032152643 0.821762000
## The Tree of Life -3.183193524 2.259276875
## War Horse -2.651283351 -0.011313828
## Amour 0.676678866 0.800609934
## Argo 1.295486139 -0.004493834
## Beasts of the Southern Wild -1.641594932 1.198443100
## Django Unchained 1.368398632 -1.582284768
## Les Miserables -1.786154693 -1.772901843
## Life of Pi 0.680921689 -0.635955517
## Lincoln -0.026274204 1.034913302
## Silver Linings Playbook 0.237579503 0.153926881
## Zero Dark Thirty -0.146040932 1.308036691
## American Hustle -0.743744716 1.987726848
## Captain Phillips 0.207250822 0.136428053
## Dallas Buyers Club 1.095154638 -1.098634659
## Gravity 1.808662879 0.792462032
## Her 0.707449019 0.525283745
## Nebraska -0.204598544 0.653651613
## Philomena -0.503197773 -0.094529799
## 12 Years a Slave 2.100854560 0.279285345
## The Wolf of Wall Street -0.663039400 -1.076259636
## American Sniper -1.844409883 -0.993999809
## Birdman or (The Unexpected Virtue of Ignorance) 0.621586316 0.408398188
## Boyhood 0.989609154 1.460184615
## The Grand Budapest Hotel 1.634209716 -0.286179436
## The Imitation Game 0.411500335 -1.065031194
## Selma 0.411542825 -0.068773038
## The Theory of Everything -1.003995061 -1.116933111
## Whiplash 2.611370669 -1.249717667
## The Big Short 0.170060595 -0.385300448
## Bridge of Spies 0.009676416 0.045721157
## Brooklyn -0.114157904 1.010762460
## Mad Max: Fury Road 2.337559214 -0.106023934
## The Martian 0.473734478 -0.357771031
## The Revenant 0.161937864 -1.138614186
## Room 1.196463021 -0.545134110
## Spotlight 1.989250915 -0.009168248
## Arrival 0.367309471 0.248612994
## Fences -1.186448927 0.815944510
## Hacksaw Ridge 0.404570037 -1.823823948
## Hell or High Water 0.153324144 0.884752265
## Hidden Figures -0.134558405 -0.586009309
## La La Land 1.829415815 0.497221042
## Lion -0.381467753 -1.365827706
## Manchester by the Sea 0.858671370 1.150335634
## Moonlight 0.896841919 1.777793051
## Call Me by Your Name 0.788578813 0.607403297
## Darkest Hour -0.795263816 -0.439185974
## Dunkirk 0.996564353 0.675675583
## Get Out 0.555076960 0.412865802
## Lady Bird -0.206141307 1.941822125
## Phantom Thread -0.786011564 1.540594077
## The Post -1.990318557 1.229655961
## The Shape of Water -0.018542859 1.459937812
## Three Billboards outside Ebbing, Missouri 1.118764632 -0.268033382
## Black Panther 0.279104946 1.057646892
## BlacKkKlansman 0.004769683 0.717101841
## Bohemian Rhapsody -1.483775371 -4.490215441
## The Favourite -0.331512533 1.843510513
## Green Book 0.468063042 -2.803170545
## Roma 0.714350361 1.696574201
## A Star Is Born -0.066617113 0.839660667
## Vice -3.934949029 -0.587917055
## Ford v Ferrari 1.682435065 -1.473745599
## The Irishman 0.944030071 1.164389790
## Jojo Rabbit -0.648230212 -2.470061984
## Joker -0.430208286 -3.099021343
## Little Women 1.096244053 0.365296701
## Marriage Story 0.942403658 0.774052883
## 1917 1.129721748 -1.028531120
## Once upon a Time...in Hollywood -0.741237528 0.734591165
## Parasite 3.117196230 -0.468032148
## The Father 1.929415735 -0.291670041
## Judas and the Black Messiah 0.960737989 0.162489630
## Mank -2.345763071 1.570456017
## Minari 0.422910493 1.064998675
## Nomadland 0.428470433 0.797058184
## Promising Young Woman -0.483645104 -0.425451081
## Sound of Metal 0.944143148 -0.069194279
## The Trial of the Chicago 7 -0.285898871 -0.355149365
## Belfast -0.511582058 -0.429370787
## CODA 1.758254848 -2.156598920
## Don't Look Up -4.442451275 -2.539922441
## Drive My Car 0.158306636 1.192487916
## Dune 1.122203185 -1.826980124
## King Richard 0.257879939 -0.723422523
## Licorice Pizza -1.872711748 2.147715796
## Nightmare Alley -3.160384027 0.636761057
## The Power of the Dog -0.720311839 2.283065628
## West Side Story 0.044190625 0.446398144
## All Quiet on the Western Front 0.952314310 -0.952191110
## Avatar: The Way of Water -0.923072631 -2.037690509
## The Banshees of Inisherin -0.069099854 1.439223633
## Elvis -1.434462767 -1.387741046
## Everything Everywhere All at Once 1.594787307 -0.536995405
## The Fabelmans -0.253962254 0.721795892
## Tar -0.777644395 1.584983545
## Top Gun: Maverick 1.673321316 -1.169382857
## Triangle of Sadness -3.612891352 -0.558900276
## Women Talking -1.120884714 0.372860732
##################################
# Extracting the quality of representation
# for the individuals
# on the factor map
##################################
DR_PCA_IND$cos2[,c("Dim.1","Dim.2")]
## Dim.1 Dim.2
## Avatar 3.473559e-02 1.363846e-01
## The Blind Side 2.654492e-01 6.001760e-01
## District 9 7.241199e-02 4.973054e-03
## An Education 2.532809e-01 3.157423e-01
## The Hurt Locker 3.809304e-01 7.761718e-02
## Inglourious Basterds 4.920414e-02 4.004019e-01
## Precious: Based on the Novel 'Push' by Sapphire 2.482848e-01 1.497069e-01
## A Serious Man 4.651816e-01 3.642437e-01
## Up 7.139238e-01 3.181429e-02
## Up in the Air 4.213049e-01 4.112778e-01
## Black Swan 5.990845e-03 3.914908e-01
## The Fighter 4.524012e-01 3.359966e-01
## Inception 3.476223e-01 5.007269e-01
## The Kids Are All Right 4.166793e-01 4.873243e-01
## The King's Speech 5.886048e-01 2.833136e-03
## 127 Hours 8.087118e-02 2.343373e-01
## The Social Network 7.817410e-01 1.321149e-01
## Toy Story 3 7.432579e-01 6.072995e-03
## True Grit 5.845162e-03 1.130276e-01
## Winter's Bone 2.367355e-01 6.227610e-01
## The Artist 7.071045e-01 1.950240e-04
## The Descendants 5.169605e-01 2.409563e-01
## Extremely Loud & Incredibly Close 8.695189e-01 9.564809e-02
## The Help 7.634343e-02 8.358854e-01
## Hugo 6.084654e-02 5.854385e-02
## Midnight in Paris 2.416386e-04 5.037676e-03
## Moneyball 5.166937e-04 3.375141e-01
## The Tree of Life 6.076654e-01 3.061099e-01
## War Horse 9.579033e-01 1.744329e-05
## Amour 1.885396e-01 2.639242e-01
## Argo 7.313005e-01 8.799629e-06
## Beasts of the Southern Wild 5.654711e-01 3.013792e-01
## Django Unchained 3.393579e-01 4.537347e-01
## Les Miserables 3.367003e-01 3.317223e-01
## Life of Pi 1.238295e-01 1.080148e-01
## Lincoln 1.224060e-04 1.899116e-01
## Silver Linings Playbook 6.949945e-02 2.917376e-02
## Zero Dark Thirty 8.195024e-03 6.574169e-01
## American Hustle 8.482517e-02 6.058853e-01
## Captain Phillips 2.053377e-02 8.897817e-03
## Dallas Buyers Club 3.810563e-01 3.834818e-01
## Gravity 3.507612e-01 6.733692e-02
## Her 2.375703e-01 1.309754e-01
## Nebraska 2.728446e-02 2.784857e-01
## Philomena 8.576662e-02 3.026762e-03
## 12 Years a Slave 8.635690e-01 1.526164e-02
## The Wolf of Wall Street 9.216683e-02 2.428455e-01
## American Sniper 6.302903e-01 1.830620e-01
## Birdman or (The Unexpected Virtue of Ignorance) 1.480236e-01 6.389924e-02
## Boyhood 2.276680e-01 4.956671e-01
## The Grand Budapest Hotel 7.454044e-01 2.285879e-02
## The Imitation Game 5.353337e-02 3.585988e-01
## Selma 2.783596e-02 7.773426e-04
## The Theory of Everything 4.276720e-01 5.293001e-01
## Whiplash 6.665594e-01 1.526601e-01
## The Big Short 3.656899e-02 1.877174e-01
## Bridge of Spies 2.143375e-04 4.785238e-03
## Brooklyn 2.858225e-03 2.240695e-01
## Mad Max: Fury Road 7.394092e-01 1.521134e-03
## The Martian 7.606632e-02 4.338431e-02
## The Revenant 3.895087e-03 1.925633e-01
## Room 3.736193e-01 7.755994e-02
## Spotlight 8.440420e-01 1.792907e-05
## Arrival 1.082896e-01 4.961011e-02
## Fences 3.887748e-01 1.838744e-01
## Hacksaw Ridge 4.540692e-02 9.227848e-01
## Hell or High Water 6.426268e-03 2.139839e-01
## Hidden Figures 3.432981e-03 6.511166e-02
## La La Land 3.054422e-01 2.256333e-02
## Lion 3.165926e-02 4.058607e-01
## Manchester by the Sea 2.344310e-01 4.207366e-01
## Moonlight 1.580445e-01 6.210264e-01
## Call Me by Your Name 2.288301e-01 1.357617e-01
## Darkest Hour 4.279259e-01 1.305098e-01
## Dunkirk 4.223402e-01 1.941459e-01
## Get Out 1.483425e-01 8.206861e-02
## Lady Bird 9.052534e-03 8.032653e-01
## Phantom Thread 1.701626e-01 6.537063e-01
## The Post 5.387903e-01 2.056562e-01
## The Shape of Water 3.675095e-05 2.278158e-01
## Three Billboards outside Ebbing, Missouri 6.958869e-01 3.994284e-02
## Black Panther 2.586023e-02 3.713459e-01
## BlacKkKlansman 2.405407e-05 5.437147e-01
## Bohemian Rhapsody 8.015624e-02 7.340665e-01
## The Favourite 1.822222e-02 5.634984e-01
## Green Book 2.383626e-02 8.549259e-01
## Roma 8.290181e-02 4.676136e-01
## A Star Is Born 3.819042e-03 6.067229e-01
## Vice 7.328548e-01 1.635957e-02
## Ford v Ferrari 4.016954e-01 3.082231e-01
## The Irishman 1.801827e-01 2.741184e-01
## Jojo Rabbit 5.215222e-02 7.572338e-01
## Joker 1.159945e-02 6.019064e-01
## Little Women 4.704613e-01 5.223972e-02
## Marriage Story 3.958672e-01 2.670649e-01
## 1917 3.152575e-01 2.613108e-01
## Once upon a Time...in Hollywood 1.155997e-01 1.135360e-01
## Parasite 7.896317e-01 1.780113e-02
## The Father 8.341746e-01 1.906290e-02
## Judas and the Black Messiah 2.520046e-01 7.208583e-03
## Mank 4.052969e-01 1.816588e-01
## Minari 7.906691e-02 5.014135e-01
## Nomadland 5.220968e-02 1.806714e-01
## Promising Young Woman 1.636160e-01 1.266110e-01
## Sound of Metal 5.282852e-01 2.837482e-03
## The Trial of the Chicago 7 3.585289e-02 5.532499e-02
## Belfast 9.399917e-02 6.621531e-02
## CODA 1.728471e-01 2.600382e-01
## Don't Look Up 7.434192e-01 2.430129e-01
## Drive My Car 8.639278e-03 4.902157e-01
## Dune 1.778672e-01 4.714335e-01
## King Richard 1.700203e-02 1.337980e-01
## Licorice Pizza 3.582575e-01 4.712019e-01
## Nightmare Alley 9.299245e-01 3.775035e-02
## The Power of the Dog 4.566770e-02 4.587803e-01
## West Side Story 6.014453e-04 6.137346e-02
## All Quiet on the Western Front 3.088681e-01 3.087882e-01
## Avatar: The Way of Water 1.423211e-01 6.935438e-01
## The Banshees of Inisherin 1.175231e-03 5.098300e-01
## Elvis 2.363814e-01 2.212339e-01
## Everything Everywhere All at Once 3.661492e-01 4.151392e-02
## The Fabelmans 4.282869e-02 3.459595e-01
## Tar 1.591355e-01 6.610807e-01
## Top Gun: Maverick 3.950945e-01 1.929549e-01
## Triangle of Sadness 9.230537e-01 2.208950e-02
## Women Talking 1.450523e-01 1.605077e-02
##################################
# Extracting the contributions
# (in percentage) of the individuals
# to the principal components
##################################
DR_PCA_IND$contrib[,c("Dim.1","Dim.2")]
## Dim.1 Dim.2
## Avatar 2.562046e-02 1.341083e-01
## The Blind Side 1.723257e+00 5.194282e+00
## District 9 6.635113e-02 6.074901e-03
## An Education 4.009363e-01 6.663216e-01
## The Hurt Locker 1.039999e+00 2.825033e-01
## Inglourious Basterds 8.495984e-02 9.216930e-01
## Precious: Based on the Novel 'Push' by Sapphire 2.432861e-01 1.955632e-01
## A Serious Man 1.850165e+00 1.931337e+00
## Up 1.385914e+00 8.233508e-02
## Up in the Air 3.216796e-01 4.186397e-01
## Black Swan 2.799826e-03 2.439173e-01
## The Fighter 1.258652e-01 1.246221e-01
## Inception 1.430286e+00 2.746593e+00
## The Kids Are All Right 1.006222e+00 1.568874e+00
## The King's Speech 1.462492e+00 9.384593e-03
## 127 Hours 5.062472e-02 1.955638e-01
## The Social Network 8.832594e-01 1.990011e-01
## Toy Story 3 1.595355e+00 1.737797e-02
## True Grit 7.395094e-03 1.906382e-01
## Winter's Bone 5.127649e-01 1.798269e+00
## The Artist 1.197257e+00 4.402202e-04
## The Descendants 3.079272e-01 1.913406e-01
## Extremely Loud & Incredibly Close 1.561006e+01 2.289181e+00
## The Help 2.177609e-01 3.178580e+00
## Hugo 1.045762e-01 1.341394e-01
## Midnight in Paris 9.600452e-05 2.668295e-03
## Moneyball 3.696024e-04 3.218636e-01
## The Tree of Life 3.622660e+00 2.432868e+00
## War Horse 2.513123e+00 6.100969e-05
## Amour 1.637068e-01 3.055073e-01
## Argo 6.000218e-01 9.625289e-06
## Beasts of the Southern Wild 9.634596e-01 6.845648e-01
## Django Unchained 6.694633e-01 1.193298e+00
## Les Miserables 1.140616e+00 1.498129e+00
## Life of Pi 1.657661e-01 1.927672e-01
## Lincoln 2.468088e-04 5.104905e-01
## Silver Linings Playbook 2.017992e-02 1.129298e-02
## Zero Dark Thirty 7.625190e-03 8.154917e-01
## American Hustle 1.977649e-01 1.883186e+00
## Captain Phillips 1.535656e-02 8.871296e-03
## Dallas Buyers Club 4.287976e-01 5.752893e-01
## Gravity 1.169544e+00 2.993206e-01
## Her 1.789336e-01 1.315127e-01
## Nebraska 1.496603e-02 2.036444e-01
## Philomena 9.052716e-02 4.259094e-03
## 12 Years a Slave 1.577951e+00 3.717714e-02
## The Wolf of Wall Street 1.571738e-01 5.520950e-01
## American Sniper 1.216232e+00 4.709256e-01
## Birdman or (The Unexpected Virtue of Ignorance) 1.381352e-01 7.949638e-02
## Boyhood 3.501297e-01 1.016238e+00
## The Grand Budapest Hotel 9.548102e-01 3.903521e-02
## The Imitation Game 6.053992e-02 5.406353e-01
## Selma 6.055243e-02 2.254323e-03
## The Theory of Everything 3.603833e-01 5.946125e-01
## Whiplash 2.438027e+00 7.443951e-01
## The Big Short 1.033972e-02 7.075852e-02
## Bridge of Spies 3.347577e-05 9.963551e-04
## Brooklyn 4.659223e-03 4.869428e-01
## Mad Max: Fury Road 1.953560e+00 5.357813e-03
## The Martian 8.023641e-02 6.100846e-02
## The Revenant 9.375583e-03 6.179209e-01
## Room 5.117997e-01 1.416402e-01
## Spotlight 1.414753e+00 4.006383e-05
## Arrival 4.823538e-02 2.945963e-02
## Fences 5.032683e-01 3.173226e-01
## Hacksaw Ridge 5.851792e-02 1.585424e+00
## Hell or High Water 8.404707e-03 3.730981e-01
## Hidden Figures 6.473261e-03 1.636774e-01
## La La Land 1.196538e+00 1.178362e-01
## Lion 5.202561e-02 8.891429e-01
## Manchester by the Sea 2.636061e-01 6.307088e-01
## Moonlight 2.875632e-01 1.506406e+00
## Call Me by Your Name 2.223267e-01 1.758466e-01
## Darkest Hour 2.261122e-01 9.193410e-02
## Dunkirk 3.550685e-01 2.175986e-01
## Get Out 1.101559e-01 8.124517e-02
## Lady Bird 1.519258e-02 1.797209e+00
## Phantom Thread 2.208815e-01 1.131244e+00
## The Post 1.416272e+00 7.206874e-01
## The Shape of Water 1.229291e-04 1.015894e+00
## Three Billboards outside Ebbing, Missouri 4.474855e-01 3.424187e-02
## Black Panther 2.785073e-02 5.331644e-01
## BlacKkKlansman 8.133558e-06 2.450989e-01
## Bohemian Rhapsody 7.871143e-01 9.609794e+00
## The Favourite 3.929175e-02 1.619836e+00
## Green Book 7.832676e-02 3.745232e+00
## Roma 1.824417e-01 1.371909e+00
## A Star Is Born 1.586621e-03 3.360372e-01
## Vice 5.535791e+00 1.647448e-01
## Ford v Ferrari 1.011994e+00 1.035201e+00
## The Irishman 3.186201e-01 6.462142e-01
## Jojo Rabbit 1.502312e-01 2.908007e+00
## Joker 6.616968e-02 4.577505e+00
## Little Women 4.296512e-01 6.360206e-02
## Marriage Story 3.175232e-01 2.855755e-01
## 1917 4.562937e-01 5.042136e-01
## Once upon a Time...in Hollywood 1.964338e-01 2.572001e-01
## Parasite 3.474000e+00 1.044074e-01
## The Father 1.330924e+00 4.054743e-02
## Judas and the Black Messiah 3.299981e-01 1.258436e-02
## Mank 1.967296e+00 1.175523e+00
## Minari 6.394379e-02 5.406022e-01
## Nomadland 6.563617e-02 3.028027e-01
## Promising Young Woman 8.362865e-02 8.627381e-02
## Sound of Metal 3.186964e-01 2.282023e-03
## The Trial of the Chicago 7 2.922311e-02 6.011762e-02
## Belfast 9.356902e-02 8.787082e-02
## CODA 1.105262e+00 2.216759e+00
## Don't Look Up 7.055809e+00 3.074827e+00
## Drive My Car 8.959829e-03 6.777783e-01
## Dune 4.502404e-01 1.590916e+00
## King Richard 2.377588e-02 2.494387e-01
## Licorice Pizza 1.253844e+00 2.198535e+00
## Nightmare Alley 3.570929e+00 1.932559e-01
## The Power of the Dog 1.854994e-01 2.484371e+00
## West Side Story 6.981710e-04 9.497832e-02
## All Quiet on the Western Front 3.242367e-01 4.321435e-01
## Avatar: The Way of Water 3.046304e-01 1.979047e+00
## The Banshees of Inisherin 1.707087e-03 9.872708e-01
## Elvis 7.356649e-01 9.179026e-01
## Everything Everywhere All at Once 9.092997e-01 1.374424e-01
## The Fabelmans 2.305897e-02 2.483182e-01
## Tar 2.162039e-01 1.197372e+00
## Top Gun: Maverick 1.001060e+00 6.517682e-01
## Triangle of Sadness 4.666715e+00 1.488841e-01
## Women Talking 4.491831e-01 6.626329e-02
##################################
# Extracting the correlation
# between the individual instances
# grouped by Picture categories
# and top principal components
##################################
(DR_PCA_IndividualCorrelationCircleByPicture <- fviz_pca_ind(DR_PCA,
geom.ind = "text",
col.ind = SD_PCA$Picture,
palette = c("#888888","#5544FF"),
repel = TRUE,
legend.title = "Picture",
addEllipses = FALSE) +
labs(title = "Principal Component Analysis : Correlation Circle",
subtitle = "Principal Components Versus Individuals Grouped by Picture Categories",
y = "Principal Component 2",
x = "Principal Component 1") +
theme_classic() +
theme(legend.position="top"))

##################################
# Extracting the correlation
# between the individual instances
# grouped by Year categories
# and top principal components
##################################
(DR_PCA_IndividualCorrelationCircleByYear <- fviz_pca_ind(DR_PCA,
geom.ind = "text",
col.ind = SD_PCA$Year,
repel = TRUE,
legend.title = "Year",
addEllipses = FALSE) +
labs(title = "Principal Component Analysis : Correlation Circle",
subtitle = "Principal Components Versus Individuals Grouped by Picture Categories",
y = "Principal Component 2",
x = "Principal Component 1") +
theme_classic() +
theme(legend.position="top"))

##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 1 and Top 2 principal components
##################################
(DR_PCA_PCIndividualContributors <- fviz_contrib(DR_PCA,
choice = "ind",
axes = 1:2,
top = 10) +
labs(title = "Principal Component Analysis : Individual Contribution",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 1 principal component
##################################
(DR_PCA_PC1IndividualContributors <- fviz_contrib(DR_PCA,
choice = "ind",
axes = 1,
top = 10) +
labs(title = "Principal Component Analysis : Individual Contribution",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 2 principal component
##################################
(DR_PCA_PC2IndividualContributors <- fviz_contrib(DR_PCA,
choice = "ind",
axes = 2,
top = 10) +
labs(title = "Principal Component Analysis : Individual Contribution",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Formulating the Biplot
# using the individual scores
# and descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_PCA_Biplot <- fviz_pca_biplot(DR_PCA,
label = c("var","ind"),
repel = TRUE,
habillage = SD_PCA$Picture,
palette = c("#888888","#5544FF"),
col.var = "#FF5050",
legend.title = "Picture",
addEllipses = FALSE) +
labs(title = "Principal Component Analysis : Biplot",
subtitle = "Individual Scores versus Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic() +
theme(legend.position="top"))

1.5.2 Correspondence Analysis (CA)
##################################
# Loading dataset for Cinematography
##################################
SD_CA_Cinematography <- Oscars.CA.Cinematography
str(SD_CA_Cinematography)
## 'data.frame': 16 obs. of 3 variables:
## $ NNOM: int 3 1 8 17 3 22 2 8 2 2 ...
## $ NOM : int 1 0 3 4 2 2 0 6 1 1 ...
## $ WON : int 0 2 1 1 0 1 0 0 0 0 ...
balloonplot(t(as.table(as.matrix(SD_CA_Cinematography))),
main="Cinematography by Film Genre",
xlab="Cinematography",
ylab="Genre",
label=FALSE,
show.margins=FALSE)

##################################
# Conducting a Chi-Square Test
##################################
(SD_CA_Cinematography_CHSQ <- chisq.test(SD_CA_Cinematography))
##
## Pearson's Chi-squared test
##
## data: SD_CA_Cinematography
## X-squared = 53.85, df = 30, p-value = 0.004777
##################################
# Performing CA
##################################
DR_CA_Cinematography <- CA(SD_CA_Cinematography,
graph = FALSE)
##################################
# Obtaining the CA eigenvalues
##################################
(DR_CA_Cinematography_EV <- get_eigenvalue(DR_CA_Cinematography))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.2992796 70.02615 70.02615
## Dim.2 0.1281030 29.97385 100.00000
##################################
# Formulating the Scree Plot
# for the variances
##################################
(DR_CA_Cinematography_VarianceScreePlot <- fviz_eig(DR_CA_Cinematography,
addlabels = TRUE,
ylim = c(0, 100),
choice = c("variance")) +
labs(title = "Correspondence Analysis : Scree Plot of Explained Variances (Cinematography)",
subtitle = "Top 2 Principal Components",
y = "Percentage of Explained Variances",
x = "Principal Components") +
theme_classic())

##################################
# Formulating the Scree Plot
# for the eigenvalues
##################################
(DR_CA_Cinematography_EigenvalueScreePlot <- fviz_eig(DR_CA_Cinematography,
addlabels = TRUE,
ylim = c(0, 6),
choice = c("eigenvalue")) +
labs(title = "Correspondence Analysis : Scree Plot of Eigenvalues (Cinematography)",
subtitle = "Top 2 Principal Components",
y = "Eigenvalues",
x = "Principal Components") +
theme_classic())

##################################
# Extracting the CA
# results for row descriptor variables
##################################
(DR_CA_Cinematography_ROW <- get_ca_row(DR_CA_Cinematography))
## Correspondence Analysis - Results for rows
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the rows"
## 2 "$cos2" "Cos2 for the rows"
## 3 "$contrib" "contributions of the rows"
## 4 "$inertia" "Inertia of the rows"
##################################
# Extracting the coordinates
# representing the loadings
# for the row descriptor variabless
##################################
DR_CA_Cinematography_ROW$coord
## Dim 1 Dim 2
## Action -0.35048211 0.041667147
## Adventure 1.61952041 -0.927278781
## Biography -0.07606662 -0.012566854
## Comedy -0.26225224 -0.133844287
## Crime -0.21528924 0.362711502
## Drama -0.37198127 -0.348215442
## Family -0.57580357 -0.493406778
## History -0.18953822 0.423862808
## LGBTQ+ -0.27537496 0.220025122
## Music -0.27537496 0.220025122
## Musical 0.82228703 0.003089121
## Romance -0.21528924 0.362711502
## Sci-Fi 1.08803149 -0.307033513
## Thriller -0.27537496 0.220025122
## War 0.97250133 0.359805071
## Western 0.92340729 0.949112996
##################################
# Extracting the quality of representation
# for the row descriptor variables
# on the factor map
##################################
DR_CA_Cinematography_ROW$cos2
## Dim 1 Dim 2
## Action 0.9860633 1.393672e-02
## Adventure 0.7531088 2.468912e-01
## Biography 0.9734313 2.656869e-02
## Comedy 0.7933535 2.066465e-01
## Crime 0.2605233 7.394767e-01
## Drama 0.5329632 4.670368e-01
## Family 0.5766083 4.233917e-01
## History 0.1666388 8.333612e-01
## LGBTQ+ 0.6103501 3.896499e-01
## Music 0.6103501 3.896499e-01
## Musical 0.9999859 1.411291e-05
## Romance 0.2605233 7.394767e-01
## Sci-Fi 0.9262414 7.375862e-02
## Thriller 0.6103501 3.896499e-01
## War 0.8795969 1.204031e-01
## Western 0.4862747 5.137253e-01
##################################
# Extracting the contributions
# (in percentage) of the row descriptor variables
# to the principal components
##################################
DR_CA_Cinematography_ROW$contrib
## Dim 1 Dim 2
## Action 1.3029990 4.302467e-02
## Adventure 20.8663496 1.598129e+01
## Biography 0.1841288 1.174098e-02
## Comedy 4.0124854 2.441701e+00
## Crime 0.6145639 4.075329e+00
## Drama 9.1734884 1.878046e+01
## Family 1.7584543 3.016548e+00
## History 1.3337486 1.558292e+01
## LGBTQ+ 0.6032850 8.997790e-01
## Music 0.6032850 8.997790e-01
## Musical 5.3792349 1.773622e-04
## Romance 0.6145639 4.075329e+00
## Sci-Fi 28.2538615 5.256348e+00
## Thriller 1.2065700 1.799558e+00
## War 15.0481819 4.812335e+00
## Western 9.0447999 2.232369e+01
##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_CA_Cinematography_ROW_PCVariableContributors <- fviz_contrib(DR_CA_Cinematography,
choice = "row",
axes = 1:2) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Cinematography)",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 1 principal component
##################################
(DR_CA_Cinematography_ROW_PC1VariableContributors <- fviz_contrib(DR_CA_Cinematography,
choice = "row",
axes = 1) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Cinematography)",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 2 principal component
##################################
(DR_CA_Cinematography_ROW_PC2VariableContributors <- fviz_contrib(DR_CA_Cinematography,
choice = "row",
axes = 2) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Cinematography)",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the CA
# results for column descriptor variables
##################################
(DR_CA_Cinematography_COL <- get_ca_col(DR_CA_Cinematography))
## Correspondence Analysis - Results for columns
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the columns"
## 2 "$cos2" "Cos2 for the columns"
## 3 "$contrib" "contributions of the columns"
## 4 "$inertia" "Inertia of the columns"
##################################
# Extracting the coordinates
# representing the loadings
# for the column descriptor variabless
##################################
DR_CA_Cinematography_COL$coord
## Dim 1 Dim 2
## NNOM -0.3150017 -0.1765976
## NOM 0.1780598 0.5894460
## WON 1.4864741 -0.4095314
##################################
# Extracting the quality of representation
# for the column descriptor variables
# on the factor map
##################################
DR_CA_Cinematography_COL$cos2
## Dim 1 Dim 2
## NNOM 0.76086152 0.23913848
## NOM 0.08362161 0.91637839
## WON 0.92945168 0.07054832
##################################
# Extracting the contributions
# (in percentage) of the column descriptor variables
# to the principal components
##################################
DR_CA_Cinematography_COL$contrib
## Dim 1 Dim 2
## NNOM 21.050778 15.45716
## NOM 2.774585 71.03494
## WON 76.174637 13.50790
##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_CA_Cinematography_COL_PCVariableContributors <- fviz_contrib(DR_CA_Cinematography,
choice = "col",
axes = 1:2) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Cinematography)",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 1 principal component
##################################
(DR_CA_Cinematography_COL_PC1VariableContributors <- fviz_contrib(DR_CA_Cinematography,
choice = "col",
axes = 1) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Cinematography)",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 2 principal component
##################################
(DR_CA_Cinematography_COL_PC2VariableContributors <- fviz_contrib(DR_CA_Cinematography,
choice = "col",
axes = 2) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Cinematography)",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Formulating the Symmetric Biplot
# using the row and column
# descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_CA_SymmetricBiplot <- fviz_ca_biplot(DR_CA_Cinematography,
label="all",
geom = "text",
repel = TRUE) +
labs(title = "Correspondence Analysis : Symmetric Biplot (Cinematography)",
subtitle = "Row versus Column Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic())

##################################
# Formulating the Asymmetric Biplot
# using the row and column
# descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_CA_SymmetricBiplot <- fviz_ca_biplot(DR_CA_Cinematography,
label="all",
geom = "text",
repel = TRUE,
map = "symbiplot") +
labs(title = "Correspondence Analysis : Asymmetric Biplot (Cinematography)",
subtitle = "Row versus Column Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic())

##################################
# Loading dataset for Directing
##################################
SD_CA_Directing <- Oscars.CA.Directing
str(SD_CA_Directing)
## 'data.frame': 16 obs. of 3 variables:
## $ NNOM: int 3 0 8 6 2 12 2 10 2 2 ...
## $ NOM : int 1 2 4 14 3 11 0 3 1 1 ...
## $ WON : int 0 1 0 2 0 2 0 1 0 0 ...
balloonplot(t(as.table(as.matrix(SD_CA_Directing))),
main="Directing by Film Genre",
xlab="Directing",
ylab="Genre",
label=FALSE,
show.margins=FALSE)

##################################
# Conducting a Chi-Square Test
##################################
(SD_CA_Directing_CHSQ <- chisq.test(SD_CA_Directing))
##
## Pearson's Chi-squared test
##
## data: SD_CA_Directing
## X-squared = 32.369, df = 30, p-value = 0.3506
##################################
# Performing CA
##################################
DR_CA_Directing <- CA(SD_CA_Directing,
graph = FALSE)
##################################
# Obtaining the CA eigenvalues
##################################
(DR_CA_Directing_EV <- get_eigenvalue(DR_CA_Directing))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.16679328 64.92707 64.92707
## Dim.2 0.09009999 35.07293 100.00000
##################################
# Formulating the Scree Plot
# for the variances
##################################
(DR_CA_Directing_VarianceScreePlot <- fviz_eig(DR_CA_Directing,
addlabels = TRUE,
ylim = c(0, 100),
choice = c("variance")) +
labs(title = "Correspondence Analysis : Scree Plot of Explained Variances (Directing)",
subtitle = "Top 2 Principal Components",
y = "Percentage of Explained Variances",
x = "Principal Components") +
theme_classic())

##################################
# Formulating the Scree Plot
# for the eigenvalues
##################################
(DR_CA_Directing_EigenvalueScreePlot <- fviz_eig(DR_CA_Directing,
addlabels = TRUE,
ylim = c(0, 6),
choice = c("eigenvalue")) +
labs(title = "Correspondence Analysis : Scree Plot of Eigenvalues (Directing)",
subtitle = "Top 2 Principal Components",
y = "Eigenvalues",
x = "Principal Components") +
theme_classic())

##################################
# Extracting the CA
# results for row descriptor variables
##################################
(DR_CA_Directing_ROW <- get_ca_row(DR_CA_Directing))
## Correspondence Analysis - Results for rows
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the rows"
## 2 "$cos2" "Cos2 for the rows"
## 3 "$contrib" "contributions of the rows"
## 4 "$inertia" "Inertia of the rows"
##################################
# Extracting the coordinates
# representing the loadings
# for the row descriptor variabless
##################################
DR_CA_Directing_ROW$coord
## Dim 1 Dim 2
## Action -0.59093674 0.025492372
## Adventure 1.02503716 0.076828292
## Biography -0.45370496 -0.086344034
## Comedy 0.29890560 -0.326516979
## Crime -0.01456328 -0.444220534
## Drama -0.05489268 -0.082970326
## Family -1.00263207 0.361001590
## History -0.45050428 0.204247535
## LGBTQ+ -0.45370496 -0.086344034
## Music -0.45370496 -0.086344034
## Musical 0.47611005 0.524173917
## Romance 0.21396947 0.190497611
## Sci-Fi -0.01680399 0.469783141
## Thriller 0.28566610 -0.004757871
## War 0.56012965 -0.228430683
## Western 0.80378579 0.941269298
##################################
# Extracting the quality of representation
# for the row descriptor variables
# on the factor map
##################################
DR_CA_Directing_ROW$cos2
## Dim 1 Dim 2
## Action 0.998142490 0.001857510
## Adventure 0.994413624 0.005586376
## Biography 0.965048427 0.034951573
## Comedy 0.455937517 0.544062483
## Crime 0.001073629 0.998926371
## Drama 0.304447910 0.695552090
## Family 0.885238705 0.114761295
## History 0.829497458 0.170502542
## LGBTQ+ 0.965048427 0.034951573
## Music 0.965048427 0.034951573
## Musical 0.452060467 0.547939533
## Romance 0.557836791 0.442163209
## Sci-Fi 0.001277834 0.998722166
## Thriller 0.999722676 0.000277324
## War 0.857401211 0.142598789
## Western 0.421701375 0.578298625
##################################
# Extracting the contributions
# (in percentage) of the row descriptor variables
# to the principal components
##################################
DR_CA_Directing_ROW$contrib
## Dim 1 Dim 2
## Action 6.646498031 0.022897349
## Adventure 14.998622769 0.155979791
## Biography 11.753824740 0.788044317
## Comedy 9.352800609 20.660410302
## Crime 0.005045905 8.691044740
## Drama 0.358442786 1.515969009
## Family 9.566742526 2.295899889
## History 13.520003083 5.144538317
## LGBTQ+ 2.938456185 0.197011079
## Music 2.938456185 0.197011079
## Musical 3.235838706 7.260671511
## Romance 1.089242375 1.598283421
## Sci-Fi 0.012092556 17.496133580
## Thriller 2.329804971 0.001196414
## War 8.957344701 2.757813226
## Western 12.296783873 31.217095975
##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_CA_Directing_ROW_PCVariableContributors <- fviz_contrib(DR_CA_Directing,
choice = "row",
axes = 1:2) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Directing)",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 1 principal component
##################################
(DR_CA_Directing_ROW_PC1VariableContributors <- fviz_contrib(DR_CA_Directing,
choice = "row",
axes = 1) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Directing)",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 2 principal component
##################################
(DR_CA_Directing_ROW_PC2VariableContributors <- fviz_contrib(DR_CA_Directing,
choice = "row",
axes = 2) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Directing)",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the CA
# results for column descriptor variables
##################################
(DR_CA_Directing_COL <- get_ca_col(DR_CA_Directing))
## Correspondence Analysis - Results for columns
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the columns"
## 2 "$cos2" "Cos2 for the columns"
## 3 "$contrib" "contributions of the columns"
## 4 "$inertia" "Inertia of the columns"
##################################
# Extracting the coordinates
# representing the loadings
# for the column descriptor variabless
##################################
DR_CA_Directing_COL$coord
## Dim 1 Dim 2
## NNOM -0.4094783 0.1083606
## NOM 0.2630727 -0.2944740
## WON 0.7297404 0.6581319
##################################
# Extracting the quality of representation
# for the column descriptor variables
# on the factor map
##################################
DR_CA_Directing_COL$cos2
## Dim 1 Dim 2
## NNOM 0.9345536 0.06544635
## NOM 0.4438576 0.55614244
## WON 0.5514589 0.44854113
##################################
# Extracting the contributions
# (in percentage) of the column descriptor variables
# to the principal components
##################################
DR_CA_Directing_COL$contrib
## Dim 1 Dim 2
## NNOM 47.07222 6.102386
## NOM 17.45333 40.483176
## WON 35.47445 53.414437
##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_CA_Directing_COL_PCVariableContributors <- fviz_contrib(DR_CA_Directing,
choice = "col",
axes = 1:2) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Directing)",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 1 principal component
##################################
(DR_CA_Directing_COL_PC1VariableContributors <- fviz_contrib(DR_CA_Directing,
choice = "col",
axes = 1) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Directing)",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 2 principal component
##################################
(DR_CA_Directing_COL_PC2VariableContributors <- fviz_contrib(DR_CA_Directing,
choice = "col",
axes = 2) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Directing)",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Formulating the Symmetric Biplot
# using the row and column
# descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_CA_SymmetricBiplot <- fviz_ca_biplot(DR_CA_Directing,
label="all",
geom = "text",
repel = TRUE) +
labs(title = "Correspondence Analysis : Symmetric Biplot (Directing)",
subtitle = "Row versus Column Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic())

##################################
# Formulating the Asymmetric Biplot
# using the row and column
# descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_CA_SymmetricBiplot <- fviz_ca_biplot(DR_CA_Directing,
label="all",
geom = "text",
repel = TRUE,
map = "symbiplot") +
labs(title = "Correspondence Analysis : Asymmetric Biplot (Directing)",
subtitle = "Row versus Column Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic())

##################################
# Loading dataset for Editing
##################################
SD_CA_Editing <- Oscars.CA.Editing
str(SD_CA_Editing)
## 'data.frame': 16 obs. of 3 variables:
## $ NNOM: int 1 1 4 11 0 19 2 7 1 0 ...
## $ NOM : int 2 2 6 10 5 6 0 5 2 1 ...
## $ WON : int 1 0 2 1 0 0 0 2 0 2 ...
balloonplot(t(as.table(as.matrix(SD_CA_Editing))),
main="Editing by Film Genre",
xlab="Editing",
ylab="Genre",
label=FALSE,
show.margins=FALSE)

##################################
# Conducting a Chi-Square Test
##################################
(SD_CA_Editing_CHSQ <- chisq.test(SD_CA_Editing))
##
## Pearson's Chi-squared test
##
## data: SD_CA_Editing
## X-squared = 50.873, df = 30, p-value = 0.01005
##################################
# Performing CA
##################################
DR_CA_Editing <- CA(SD_CA_Editing,
graph = FALSE)
##################################
# Obtaining the CA eigenvalues
##################################
(DR_CA_Editing_EV <- get_eigenvalue(DR_CA_Editing))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.2584184 64.00355 64.00355
## Dim.2 0.1453380 35.99645 100.00000
##################################
# Formulating the Scree Plot
# for the variances
##################################
(DR_CA_Editing_VarianceScreePlot <- fviz_eig(DR_CA_Editing,
addlabels = TRUE,
ylim = c(0, 100),
choice = c("variance")) +
labs(title = "Correspondence Analysis : Scree Plot of Explained Variances (Editing)",
subtitle = "Top 2 Principal Components",
y = "Percentage of Explained Variances",
x = "Principal Components") +
theme_classic())

##################################
# Formulating the Scree Plot
# for the eigenvalues
##################################
(DR_CA_Editing_EigenvalueScreePlot <- fviz_eig(DR_CA_Editing,
addlabels = TRUE,
ylim = c(0, 6),
choice = c("eigenvalue")) +
labs(title = "Correspondence Analysis : Scree Plot of Eigenvalues (Editing)",
subtitle = "Top 2 Principal Components",
y = "Eigenvalues",
x = "Principal Components") +
theme_classic())

##################################
# Extracting the CA
# results for row descriptor variables
##################################
(DR_CA_Editing_ROW <- get_ca_row(DR_CA_Editing))
## Correspondence Analysis - Results for rows
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the rows"
## 2 "$cos2" "Cos2 for the rows"
## 3 "$contrib" "contributions of the rows"
## 4 "$inertia" "Inertia of the rows"
##################################
# Extracting the coordinates
# representing the loadings
# for the row descriptor variabless
##################################
DR_CA_Editing_ROW$coord
## Dim 1 Dim 2
## Action 0.56704105 -0.13820733
## Adventure -0.16441566 -0.52858184
## Biography 0.28198554 -0.15808970
## Comedy -0.16637971 -0.09681112
## Crime 0.08300400 -1.19003666
## Drama -0.48111282 0.31808033
## Family -0.65925497 0.79432780
## History 0.09450411 0.11971026
## LGBTQ+ -0.16441566 -0.52858184
## Music 1.86860879 0.29193192
## Musical -0.41183532 0.13287298
## Romance -0.51080318 0.39745491
## Sci-Fi 0.43078594 -0.03459231
## Thriller -0.16441566 -0.52858184
## War 1.17478794 0.58289460
## Western -0.28812549 -0.19785443
##################################
# Extracting the quality of representation
# for the row descriptor variables
# on the factor map
##################################
DR_CA_Editing_ROW$cos2
## Dim 1 Dim 2
## Action 0.943924837 0.056075163
## Adventure 0.088217200 0.911782800
## Biography 0.760857495 0.239142505
## Comedy 0.747065329 0.252934671
## Crime 0.004841385 0.995158615
## Drama 0.695846312 0.304153688
## Family 0.407871452 0.592128548
## History 0.383939061 0.616060939
## LGBTQ+ 0.088217200 0.911782800
## Music 0.976173866 0.023826134
## Musical 0.905719893 0.094280107
## Romance 0.622883592 0.377116408
## Sci-Fi 0.993593147 0.006406853
## Thriller 0.088217200 0.911782800
## War 0.802449252 0.197550748
## Western 0.679555338 0.320444662
##################################
# Extracting the contributions
# (in percentage) of the row descriptor variables
# to the principal components
##################################
DR_CA_Editing_ROW$contrib
## Dim 1 Dim 2
## Action 3.9499807 0.41722700
## Adventure 0.2490655 4.57715913
## Biography 2.9304944 1.63771622
## Comedy 1.8703779 1.12596125
## Crime 0.1057972 38.66704312
## Drama 17.7721450 13.81223319
## Family 2.6695792 6.89096101
## History 0.3840037 1.09557242
## LGBTQ+ 0.2490655 4.57715913
## Music 32.1709592 1.39615846
## Musical 1.5626957 0.28923102
## Romance 4.0066662 4.31316430
## Sci-Fi 5.1294586 0.05881011
## Thriller 0.4981310 9.15431827
## War 25.4317465 11.13221503
## Western 1.0198336 0.85507033
##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_CA_Editing_ROW_PCVariableContributors <- fviz_contrib(DR_CA_Editing,
choice = "row",
axes = 1:2) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Editing)",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 1 principal component
##################################
(DR_CA_Editing_ROW_PC1VariableContributors <- fviz_contrib(DR_CA_Editing,
choice = "row",
axes = 1) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Editing)",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 2 principal component
##################################
(DR_CA_Editing_ROW_PC2VariableContributors <- fviz_contrib(DR_CA_Editing,
choice = "row",
axes = 2) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Editing)",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the CA
# results for column descriptor variables
##################################
(DR_CA_Editing_COL <- get_ca_col(DR_CA_Editing))
## Correspondence Analysis - Results for columns
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the columns"
## 2 "$cos2" "Cos2 for the columns"
## 3 "$contrib" "contributions of the columns"
## 4 "$inertia" "Inertia of the columns"
##################################
# Extracting the coordinates
# representing the loadings
# for the column descriptor variabless
##################################
DR_CA_Editing_COL$coord
## Dim 1 Dim 2
## NNOM -0.33513143 0.3028233
## NOM 0.04219498 -0.4536803
## WON 1.40375986 0.3937809
##################################
# Extracting the quality of representation
# for the column descriptor variables
# on the factor map
##################################
DR_CA_Editing_COL$cos2
## Dim 1 Dim 2
## NNOM 0.550513662 0.44948634
## NOM 0.008575927 0.99142407
## WON 0.927049751 0.07295025
##################################
# Extracting the contributions
# (in percentage) of the column descriptor variables
# to the principal components
##################################
DR_CA_Editing_COL$contrib
## Dim 1 Dim 2
## NNOM 21.0409876 30.54631
## NOM 0.2843353 58.44582
## WON 78.6746771 11.00786
##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_CA_Editing_COL_PCVariableContributors <- fviz_contrib(DR_CA_Editing,
choice = "col",
axes = 1:2) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Editing)",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 1 principal component
##################################
(DR_CA_Editing_COL_PC1VariableContributors <- fviz_contrib(DR_CA_Editing,
choice = "col",
axes = 1) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Editing)",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 2 principal component
##################################
(DR_CA_Editing_COL_PC2VariableContributors <- fviz_contrib(DR_CA_Editing,
choice = "col",
axes = 2) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Editing)",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Formulating the Symmetric Biplot
# using the row and column
# descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_CA_SymmetricBiplot <- fviz_ca_biplot(DR_CA_Editing,
label="all",
geom = "text",
repel = TRUE) +
labs(title = "Correspondence Analysis : Symmetric Biplot (Editing)",
subtitle = "Row versus Column Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic())

##################################
# Formulating the Asymmetric Biplot
# using the row and column
# descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_CA_SymmetricBiplot <- fviz_ca_biplot(DR_CA_Editing,
label="all",
geom = "text",
repel = TRUE,
map = "symbiplot") +
labs(title = "Correspondence Analysis : Asymmetric Biplot (Editing)",
subtitle = "Row versus Column Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic())

##################################
# Loading dataset for Screenplay
##################################
SD_CA_Screenplay <- Oscars.CA.Screenplay
str(SD_CA_Screenplay)
## 'data.frame': 16 obs. of 3 variables:
## $ NNOM: int 2 0 4 0 0 3 0 5 0 0 ...
## $ NOM : int 2 3 7 14 5 15 2 5 1 3 ...
## $ WON : int 0 0 1 8 0 7 0 4 2 0 ...
balloonplot(t(as.table(as.matrix(SD_CA_Screenplay))),
main="Screenplay by Film Genre",
xlab="Screenplay",
ylab="Genre",
label=FALSE,
show.margins=FALSE)

##################################
# Conducting a Chi-Square Test
##################################
(SD_CA_Screenplay_CHSQ <- chisq.test(SD_CA_Screenplay))
##
## Pearson's Chi-squared test
##
## data: SD_CA_Screenplay
## X-squared = 41.247, df = 30, p-value = 0.08289
##################################
# Performing CA
##################################
DR_CA_Screenplay <- CA(SD_CA_Screenplay,
graph = FALSE)
##################################
# Obtaining the CA eigenvalues
##################################
(DR_CA_Screenplay_EV <- get_eigenvalue(DR_CA_Screenplay))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.1831592 55.95135 55.95135
## Dim.2 0.1441952 44.04865 100.00000
##################################
# Formulating the Scree Plot
# for the variances
##################################
(DR_CA_Screenplay_VarianceScreePlot <- fviz_eig(DR_CA_Screenplay,
addlabels = TRUE,
ylim = c(0, 100),
choice = c("variance")) +
labs(title = "Correspondence Analysis : Scree Plot of Explained Variances (Screenplay)",
subtitle = "Top 2 Principal Components",
y = "Percentage of Explained Variances",
x = "Principal Components") +
theme_classic())

##################################
# Formulating the Scree Plot
# for the eigenvalues
##################################
(DR_CA_Screenplay_EigenvalueScreePlot <- fviz_eig(DR_CA_Screenplay,
addlabels = TRUE,
ylim = c(0, 6),
choice = c("eigenvalue")) +
labs(title = "Correspondence Analysis : Scree Plot of Eigenvalues (Screenplay)",
subtitle = "Top 2 Principal Components",
y = "Eigenvalues",
x = "Principal Components") +
theme_classic())

##################################
# Extracting the CA
# results for row descriptor variables
##################################
(DR_CA_Screenplay_ROW <- get_ca_row(DR_CA_Screenplay))
## Correspondence Analysis - Results for rows
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the rows"
## 2 "$cos2" "Cos2 for the rows"
## 3 "$contrib" "contributions of the rows"
## 4 "$inertia" "Inertia of the rows"
##################################
# Extracting the coordinates
# representing the loadings
# for the row descriptor variabless
##################################
DR_CA_Screenplay_ROW$coord
## Dim 1 Dim 2
## Action 0.81234217 -0.17778369
## Adventure -0.33965391 -0.78144905
## Biography 0.38428571 -0.17579324
## Comedy -0.53190601 0.10529527
## Crime -0.33965391 -0.78144905
## Drama -0.21120897 0.04622376
## Family -0.33965391 -0.78144905
## History 0.33214522 0.34646817
## LGBTQ+ -0.69211609 0.84424886
## Music -0.33965391 -0.78144905
## Musical 1.19634087 0.02343809
## Romance 0.01540587 -0.05227353
## Sci-Fi 0.42834348 -0.37900548
## Thriller 0.16399685 0.84026796
## War 0.34022793 0.02741900
## Western 0.10417082 0.13002035
##################################
# Extracting the quality of representation
# for the row descriptor variables
# on the factor map
##################################
DR_CA_Screenplay_ROW$cos2
## Dim 1 Dim 2
## Action 0.95429251 0.045707489
## Adventure 0.15889866 0.841101339
## Biography 0.82694884 0.173051161
## Comedy 0.96229022 0.037709779
## Crime 0.15889866 0.841101339
## Drama 0.95429251 0.045707489
## Family 0.15889866 0.841101339
## History 0.47890320 0.521096802
## LGBTQ+ 0.40194025 0.598059754
## Music 0.15889866 0.841101339
## Musical 0.99961632 0.000383679
## Romance 0.07991642 0.920083582
## Sci-Fi 0.56088375 0.439116254
## Thriller 0.03669438 0.963305625
## War 0.99354715 0.006452853
## Western 0.39095067 0.609049332
##################################
# Extracting the contributions
# (in percentage) of the row descriptor variables
# to the principal components
##################################
DR_CA_Screenplay_ROW$contrib
## Dim 1 Dim 2
## Action 11.437701279 0.695861002
## Adventure 1.499668464 10.083266568
## Biography 7.678748880 2.041099813
## Comedy 26.970762770 1.342514688
## Crime 2.499447440 16.805444281
## Drama 4.832428790 0.294001273
## Family 0.999778976 6.722177712
## History 6.692444844 9.249822006
## LGBTQ+ 6.227014798 11.769035837
## Music 1.499668464 10.083266568
## Musical 18.605094943 0.009070776
## Romance 0.005142132 0.075199052
## Sci-Fi 7.155295654 7.115610127
## Thriller 0.699234768 23.316615710
## War 3.009483353 0.024827508
## Western 0.188084443 0.372187078
##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_CA_Screenplay_ROW_PCVariableContributors <- fviz_contrib(DR_CA_Screenplay,
choice = "row",
axes = 1:2) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Screenplay)",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 1 principal component
##################################
(DR_CA_Screenplay_ROW_PC1VariableContributors <- fviz_contrib(DR_CA_Screenplay,
choice = "row",
axes = 1) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Screenplay)",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the row descriptor variables
# for the the Top 2 principal component
##################################
(DR_CA_Screenplay_ROW_PC2VariableContributors <- fviz_contrib(DR_CA_Screenplay,
choice = "row",
axes = 2) +
labs(title = "Correspondence Analysis : Row Descriptor Variable Contribution (Screenplay)",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the CA
# results for column descriptor variables
##################################
(DR_CA_Screenplay_COL <- get_ca_col(DR_CA_Screenplay))
## Correspondence Analysis - Results for columns
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the columns"
## 2 "$cos2" "Cos2 for the columns"
## 3 "$contrib" "contributions of the columns"
## 4 "$inertia" "Inertia of the columns"
##################################
# Extracting the coordinates
# representing the loadings
# for the column descriptor variabless
##################################
DR_CA_Screenplay_COL$coord
## Dim 1 Dim 2
## NNOM 0.8406798 0.1617202
## NOM -0.1453620 -0.2967400
## WON -0.3716274 0.6292505
##################################
# Extracting the quality of representation
# for the column descriptor variables
# on the factor map
##################################
DR_CA_Screenplay_COL$cos2
## Dim 1 Dim 2
## NNOM 0.9643149 0.03568505
## NOM 0.1935263 0.80647369
## WON 0.2585967 0.74140333
##################################
# Extracting the contributions
# (in percentage) of the column descriptor variables
# to the principal components
##################################
DR_CA_Screenplay_COL$contrib
## Dim 1 Dim 2
## NNOM 76.560016 3.598714
## NOM 6.683833 35.379659
## WON 16.756151 61.021626
##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_CA_Screenplay_COL_PCVariableContributors <- fviz_contrib(DR_CA_Screenplay,
choice = "col",
axes = 1:2) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Screenplay)",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 1 principal component
##################################
(DR_CA_Screenplay_COL_PC1VariableContributors <- fviz_contrib(DR_CA_Screenplay,
choice = "col",
axes = 1) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Screenplay)",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the column descriptor variables
# for the the Top 2 principal component
##################################
(DR_CA_Screenplay_COL_PC2VariableContributors <- fviz_contrib(DR_CA_Screenplay,
choice = "col",
axes = 2) +
labs(title = "Correspondence Analysis : Column Descriptor Variable Contribution (Screenplay)",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Formulating the Symmetric Biplot
# using the row and column
# descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_CA_SymmetricBiplot <- fviz_ca_biplot(DR_CA_Screenplay,
label="all",
geom = "text",
repel = TRUE) +
labs(title = "Correspondence Analysis : Symmetric Biplot (Screenplay)",
subtitle = "Row versus Column Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic())

##################################
# Formulating the Asymmetric Biplot
# using the row and column
# descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_CA_SymmetricBiplot <- fviz_ca_biplot(DR_CA_Screenplay,
label="all",
geom = "text",
repel = TRUE,
map = "symbiplot") +
labs(title = "Correspondence Analysis : Asymmetric Biplot (Screenplay)",
subtitle = "Row versus Column Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic())

1.5.3 Multiple Correspondence Analysis (MCA)
##################################
# Loading dataset
##################################
SD_MCA <- Oscars.MCA
str(SD_MCA)
## 'data.frame': 126 obs. of 6 variables:
## $ Cinematography: Factor w/ 3 levels "NNOM","NOM","WON": 3 1 1 1 2 2 1 1 1 1 ...
## $ Directing : Factor w/ 3 levels "NNOM","NOM","WON": 2 1 1 1 3 2 2 1 1 2 ...
## $ Editing : Factor w/ 3 levels "NNOM","NOM","WON": 2 1 2 1 3 2 2 1 1 1 ...
## $ Screenplay : Factor w/ 3 levels "NNOM","NOM","WON": 1 1 2 2 3 2 3 2 2 2 ...
## $ Year : Factor w/ 14 levels "2010","2011",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Picture : Factor w/ 2 levels "NOM","WON": 1 1 1 1 2 1 1 1 1 1 ...
##################################
# Performing MCA
##################################
DR_MCA <- MCA(SD_MCA[,c(1:4)],
graph = FALSE)
##################################
# Obtaining the MCA eigenvalues
##################################
(DR_MCA_EV <- get_eigenvalue(DR_MCA))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.4014500 20.072501 20.07250
## Dim.2 0.3482608 17.413038 37.48554
## Dim.3 0.2934227 14.671133 52.15667
## Dim.4 0.2814318 14.071589 66.22826
## Dim.5 0.2191032 10.955162 77.18342
## Dim.6 0.1864147 9.320736 86.50416
## Dim.7 0.1647588 8.237939 94.74210
## Dim.8 0.1051580 5.257902 100.00000
##################################
# Formulating the Scree Plot
# for the variances
##################################
(DR_MCA_VarianceScreePlot <- fviz_eig(DR_MCA,
addlabels = TRUE,
ylim = c(0, 100),
choice = c("variance")) +
labs(title = "Multiple Correspondence Analysis : Scree Plot of Explained Variances",
subtitle = "Top 2 Principal Components",
y = "Percentage of Explained Variances",
x = "Principal Components") +
theme_classic())

##################################
# Formulating the Scree Plot
# for the eigenvalues
##################################
(DR_MCA_EigenvalueScreePlot <- fviz_eig(DR_MCA,
addlabels = TRUE,
ylim = c(0, 6),
choice = c("eigenvalue")) +
labs(title = "Multiple Correspondence Analysis : Scree Plot of Eigenvalues",
subtitle = "Top 2 Principal Components",
y = "Eigenvalues",
x = "Principal Components") +
theme_classic())

##################################
# Extracting the MCA
# descriptor variable loadings
##################################
DR_MCA_VAR <- get_mca_var(DR_MCA)
##################################
# Extracting the coordinates
# representing the loadings
# for the descriptor variables
##################################
DR_MCA_VAR$coord
## Dim 1 Dim 2 Dim 3 Dim 4
## Cinematography_NNOM -0.5220182052 -0.1264760 0.12785667 0.38942121
## Cinematography_NOM 0.6927420866 -0.1325546 0.28326894 -1.19605797
## Cinematography_WON 1.4539205813 1.1147985 -1.50587760 0.63970892
## Directing_NNOM -0.6590730961 0.3732744 -0.19240706 0.23562263
## Directing_NOM 0.1678869817 -0.5334948 0.37577191 -0.49146594
## Directing_WON 2.1419501885 0.4465739 -0.61170677 0.86756855
## Editing_NNOM -0.5971330422 0.3091986 -0.24548152 -0.12141930
## Editing_NOM 0.5111151354 -0.7525933 -0.07146831 0.05150745
## Editing_WON 0.7574714257 1.5595180 1.43774806 0.36370692
## Screenplay_NNOM -0.0009048548 1.4299445 0.50282891 -0.74923832
## Screenplay_NOM -0.1076851566 -0.3010392 -0.60515086 -0.15377497
## Screenplay_WON 0.2815584929 -0.4918838 1.12876035 1.06987609
## Dim 5
## Cinematography_NNOM 0.003994189
## Cinematography_NOM -0.489028350
## Cinematography_WON 1.216800033
## Directing_NNOM -0.554851077
## Directing_NOM 0.712251511
## Directing_WON -0.358079754
## Editing_NNOM 0.459427895
## Editing_NOM -0.341356193
## Editing_WON -0.790352271
## Screenplay_NNOM 0.251177034
## Screenplay_NOM -0.230016943
## Screenplay_WON 0.375421821
##################################
# Extracting the quality of representation
# for the descriptor variables
# on the factor map
##################################
DR_MCA_VAR$cos2
## Dim 1 Dim 2 Dim 3 Dim 4
## Cinematography_NNOM 4.739183e-01 0.027819444 0.028430136 0.263737183
## Cinematography_NOM 1.702841e-01 0.006234768 0.028472717 0.507616171
## Cinematography_WON 2.431903e-01 0.142974205 0.260882083 0.047079271
## Directing_NNOM 3.825114e-01 0.122696917 0.032600122 0.048889005
## Directing_NOM 2.046384e-02 0.206639535 0.102518359 0.175363763
## Directing_WON 5.734938e-01 0.024928533 0.046773147 0.094084398
## Editing_NNOM 3.346252e-01 0.089720491 0.056552798 0.013835406
## Editing_NOM 1.835731e-01 0.398008472 0.003589208 0.001864282
## Editing_WON 6.600813e-02 0.279798703 0.237810207 0.015218367
## Screenplay_NNOM 2.026639e-07 0.506124052 0.062583394 0.138950015
## Screenplay_NOM 1.597198e-02 0.124822588 0.504399092 0.032570039
## Screenplay_WON 2.265005e-02 0.069128491 0.364028552 0.327038529
## Dim 5
## Cinematography_NNOM 0.0000277453
## Cinematography_NOM 0.0848592258
## Cinematography_WON 0.1703347802
## Directing_NNOM 0.2711003480
## Directing_NOM 0.3683153067
## Directing_WON 0.0160276388
## Editing_NNOM 0.1980848216
## Editing_NOM 0.0818817653
## Editing_WON 0.0718631616
## Screenplay_NNOM 0.0156163125
## Screenplay_NOM 0.0728729994
## Screenplay_WON 0.0402690125
##################################
# Extracting the contributions
# (in percentage) of the descriptor variables
# to the principal components
##################################
DR_MCA_VAR$contrib
## Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
## Cinematography_NNOM 1.077455e+01 0.7290733 0.8843264 8.55313892 0.001155761
## Cinematography_NOM 7.826995e+00 0.3303453 1.7905550 33.28240706 7.146652781
## Cinematography_WON 1.358197e+01 9.2045183 19.9341857 3.75063238 17.430203052
## Directing_NNOM 1.266652e+01 4.6835308 1.4769637 2.30931050 16.448471491
## Directing_NOM 7.383255e-01 8.5941159 5.0605801 9.02524596 24.348000197
## Directing_WON 3.174569e+01 1.5906685 3.5423414 7.42902749 1.625579588
## Editing_NNOM 1.075004e+01 3.3225378 2.4856688 0.63401803 11.659642215
## Editing_NOM 6.713961e+00 16.7798845 0.1795999 0.09726122 5.487058437
## Editing_WON 3.686497e+00 18.0131253 18.1712472 1.21239133 7.353691928
## Screenplay_NNOM 1.011663e-05 29.1234808 4.2742143 9.89410016 1.428303605
## Screenplay_NOM 4.183816e-01 3.7690681 18.0769838 1.21699970 3.497543972
## Screenplay_WON 1.097067e+00 3.8596514 24.1233338 22.59546726 3.573696974
##################################
# Extracting the quality of representation
# for the descriptor variables
# on the factor map
##################################
(DR_MCA_VariableSquaredCorrelationCircle <- fviz_mca_var(DR_MCA,
col.var = "cos2",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
legend.title = "Squared Coordinates") +
labs(title = "Multiple Correspondence Analysis : Squared Correlation Circle",
subtitle = "Descriptor Variable Representation Quality",
y = "Principal Component 2",
x = "Principal Component 1") +
theme_classic() +
theme(legend.position="top"))

##################################
# Formulating clusters of the
# descriptor variables
##################################
set.seed(123)
DR_MCA_KMEANS <- kmeans(DR_MCA_VAR$coord, centers=3, nstart=25)
DR_MCA_KMEANS_CLUSTER <- as.factor(DR_MCA_KMEANS$cluster)
##################################
# Extracting the correlation
# between the descriptor variables
# and top principal components
##################################
(DR_MCA_VariableCorrelationCircle <- fviz_mca_var(DR_MCA,
col.var = DR_MCA_KMEANS_CLUSTER,
palette = c("#0073C2FF", "#EFC000FF", "#868686FF"),
legend.title = "Cluster") +
labs(title = "Multiple Correspondence Analysis : Correlation Circle",
subtitle = "Principal Components Versus Descriptor Variable Clusters",
y = "Principal Component 2",
x = "Principal Component 1") +
theme_classic() +
theme(legend.position="top"))

##################################
# Extracting the contribution
# of the descriptor variables
# for the the Top 1 and 2 principal components
##################################
(DR_MCA_PCVariableContributors <- fviz_contrib(DR_MCA,
choice = "var",
axes = 1:2) +
labs(title = "Multiple Correspondence Analysis : Descriptor Variable Contribution",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Testing the statistical significance
# of the contribution
# of the descriptor variables
# for the the Top 1 and 2 principal components
##################################
DR_MCA_VariableContributionTest <- dimdesc(DR_MCA,
axes = c(1,2),
proba = 0.05)
##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 1 principal component
##################################
(DR_MCA_PC1VariableContributors <- fviz_contrib(DR_MCA,
choice = "var",
axes = 1) +
labs(title = "Multiple Correspondence Analysis : Descriptor Variable Contribution",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Testing the statistical significance
# of the contribution
# of the descriptor variables
# for the Top 1 principal component
##################################
DR_MCA_VariableContributionTest$'Dim 1'
##
## Link between the variable and the categorical variable (1-way anova)
## =============================================
## R2 p.value
## Directing 0.7250272 3.283334e-35
## Cinematography 0.5168029 3.746658e-20
## Editing 0.3396347 8.255279e-12
##
## Link between variable abd the categories of the categorical variables
## ================================================================
## Estimate p.value
## Directing=Directing_WON 1.0084996 1.066429e-24
## Cinematography=Cinematography_WON 0.5780799 4.450536e-09
## Editing=Editing_NOM 0.1820318 5.586469e-07
## Cinematography=Cinematography_NOM 0.0957966 1.573813e-06
## Editing=Editing_WON 0.3381234 3.683149e-03
## Editing=Editing_NNOM -0.5201552 1.304909e-12
## Directing=Directing_NNOM -0.7662311 1.192963e-14
## Cinematography=Cinematography_NNOM -0.6738765 5.224619e-19
##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 2 principal component
##################################
(DR_MCA_PC2VariableContributors <- fviz_contrib(DR_MCA,
choice = "var",
axes = 2) +
labs(title = "Multiple Correspondence Analysis : Descriptor Variable Contribution",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Testing the statistical significance
# of the contribution
# of the descriptor variables
# for the Top 2 principal component
##################################
DR_MCA_VariableContributionTest$'Dim 2'
##
## Link between the variable and the categorical variable (1-way anova)
## =============================================
## R2 p.value
## Editing 0.5309660 6.012674e-21
## Screenplay 0.5119740 6.906284e-20
## Directing 0.2071220 6.326897e-07
## Cinematography 0.1429811 7.566777e-05
##
## Link between variable abd the categories of the categorical variables
## ================================================================
## Estimate p.value
## Screenplay=Screenplay_NNOM 0.71855222 1.007469e-20
## Editing=Editing_WON 0.70077312 1.925781e-10
## Cinematography=Cinematography_WON 0.48954310 1.269630e-05
## Directing=Directing_NNOM 0.16395356 5.798540e-05
## Screenplay=Screenplay_WON -0.41558827 2.935568e-03
## Editing=Editing_NNOM -0.03708562 6.560151e-04
## Screenplay=Screenplay_NOM -0.30296395 4.950520e-05
## Directing=Directing_NOM -0.37116380 8.944524e-08
## Editing=Editing_NOM -0.66368750 2.420712e-15
##################################
# Extracting the MCA individual scores
##################################
DR_MCA_IND <- get_mca_ind(DR_MCA)
##################################
# Extracting the coordinates
# representing the scores
# for the individuals
##################################
DR_MCA_IND$coord[,c("Dim 1","Dim 2")]
## Dim 1 Dim 2
## Avatar 0.84123067 0.53320524
## The Blind Side -0.70199134 0.84130641
## District 9 -0.30684197 -0.34179995
## An Education -0.74412366 0.10800804
## The Hurt Locker 1.52845528 0.58531128
## Inglourious Basterds 0.49876001 -0.72851058
## Precious: Based on the Novel 'Push' by Sapphire 0.17303576 -0.80678321
## A Serious Man -0.74412366 0.10800804
## Up -0.74412366 0.10800804
## Up in the Air -0.41782987 -0.27612752
## Black Swan 0.54089234 0.00478779
## The Fighter 0.01945182 -0.72593552
## Inception 0.03552287 0.63385043
## The Kids Are All Right -0.74412366 0.10800804
## The King's Speech 1.43125043 -0.39417077
## 127 Hours -0.30684197 -0.34179995
## The Social Network 0.74954880 0.17012377
## Toy Story 3 -0.74412366 0.10800804
## True Grit 0.06147832 -0.27870259
## Winter's Bone -0.74412366 0.10800804
## The Artist 1.27766650 -0.31332307
## The Descendants 0.17303576 -0.80678321
## Extremely Loud & Incredibly Close -0.70199134 0.84130641
## The Help -0.70199134 0.84130641
## Hugo 0.79909835 -0.20009313
## Midnight in Paris -0.26424593 -0.35697522
## Moneyball -0.30684197 -0.34179995
## The Tree of Life 0.10361065 0.45459578
## War Horse -0.22268314 0.83873134
## Amour -0.41782987 -0.27612752
## Argo -0.05605319 0.55683439
## Beasts of the Southern Wild -0.41782987 -0.27612752
## Django Unchained -0.11123153 0.02458528
## Les Miserables -0.70199134 0.84130641
## Life of Pi 1.57800484 0.21509438
## Lincoln 0.49876001 -0.72851058
## Silver Linings Playbook 0.01945182 -0.72593552
## Zero Dark Thirty -0.30684197 -0.34179995
## American Hustle 0.01945182 -0.72593552
## Captain Phillips -0.30684197 -0.34179995
## Dallas Buyers Club -0.30684197 -0.34179995
## Gravity 1.71734201 1.92787479
## Her -0.59053972 0.02716034
## Nebraska 0.06147832 -0.27870259
## Philomena -0.74412366 0.10800804
## 12 Years a Slave 0.17303576 -0.80678321
## The Wolf of Wall Street -0.41782987 -0.27612752
## American Sniper -0.30684197 -0.34179995
## Birdman or (The Unexpected Virtue of Ignorance) 1.29430709 0.58405467
## Boyhood 0.01945182 -0.72593552
## The Grand Budapest Hotel 0.49876001 -0.72851058
## The Imitation Game 0.17303576 -0.80678321
## Selma -0.70199134 0.84130641
## The Theory of Everything -0.74412366 0.10800804
## Whiplash -0.20963712 0.63768209
## The Big Short 0.17303576 -0.80678321
## Bridge of Spies -0.74412366 0.10800804
## Brooklyn -0.74412366 0.10800804
## Mad Max: Fury Road 0.63809718 0.98426984
## The Martian -0.74412366 0.10800804
## The Revenant 1.62013716 0.94839275
## Room -0.41782987 -0.27612752
## Spotlight 0.17303576 -0.80678321
## Arrival 0.49876001 -0.72851058
## Fences -0.74412366 0.10800804
## Hacksaw Ridge 0.15878899 0.98684490
## Hell or High Water -0.30684197 -0.34179995
## Hidden Figures -0.74412366 0.10800804
## La La Land 1.57800484 0.21509438
## Lion -0.26481547 0.10543298
## Manchester by the Sea -0.26424593 -0.35697522
## Moonlight 0.65234395 -0.80935828
## Call Me by Your Name -0.59053972 0.02716034
## Darkest Hour -0.22268314 0.83873134
## Dunkirk 0.63809718 0.98426984
## Get Out -0.26424593 -0.35697522
## Lady Bird -0.41782987 -0.27612752
## Phantom Thread -0.37569755 0.45717084
## The Post -0.70199134 0.84130641
## The Shape of Water 1.27766650 -0.31332307
## Three Billboards outside Ebbing, Missouri -0.30684197 -0.34179995
## Black Panther -0.70199134 0.84130641
## BlacKkKlansman 0.17303576 -0.80678321
## Bohemian Rhapsody -0.16750480 1.37098046
## The Favourite 0.49876001 -0.72851058
## Green Book -0.15325803 -0.42264765
## Roma 1.14072315 0.66490237
## A Star Is Born -0.26481547 0.10543298
## Vice 0.01945182 -0.72593552
## Ford v Ferrari -0.16750480 1.37098046
## The Irishman 0.49876001 -0.72851058
## Jojo Rabbit -0.15325803 -0.42264765
## Joker 0.49876001 -0.72851058
## Little Women -0.74412366 0.10800804
## Marriage Story -0.74412366 0.10800804
## 1917 0.36181666 0.24971486
## Once upon a Time...in Hollywood 0.06147832 -0.27870259
## Parasite 0.95194224 -0.39159571
## The Father -0.15325803 -0.42264765
## Judas and the Black Messiah -0.26481547 0.10543298
## Mank 0.40394899 0.98301323
## Minari -0.41782987 -0.27612752
## Nomadland 1.27766650 -0.31332307
## Promising Young Woman 0.17303576 -0.80678321
## Sound of Metal -0.20963712 0.63768209
## The Trial of the Chicago 7 0.17246622 -0.34437502
## Belfast -0.26424593 -0.35697522
## CODA -0.59053972 0.02716034
## Don't Look Up -0.30684197 -0.34179995
## Drive My Car -0.41782987 -0.27612752
## Dune 0.57000941 1.16352448
## King Richard -0.30684197 -0.34179995
## Licorice Pizza -0.41782987 -0.27612752
## Nightmare Alley -0.22268314 0.83873134
## The Power of the Dog 1.27766650 -0.31332307
## West Side Story 0.10361065 0.45459578
## All Quiet on the Western Front 0.03552287 0.63385043
## Avatar: The Way of Water -0.70199134 0.84130641
## The Banshees of Inisherin 0.01945182 -0.72593552
## Elvis 0.21459854 0.38892335
## Everything Everywhere All at Once 1.04914709 0.58788634
## The Fabelmans -0.41782987 -0.27612752
## Tar 0.49876001 -0.72851058
## Top Gun: Maverick -0.30684197 -0.34179995
## Triangle of Sadness -0.41782987 -0.27612752
## Women Talking -0.59053972 0.02716034
##################################
# Extracting the quality of representation
# for the individuals
# on the factor map
##################################
DR_MCA_IND$cos2[,c("Dim 1","Dim 2")]
## Dim 1 Dim 2
## Avatar 0.1822392975 7.321510e-02
## The Blind Side 0.2891900021 4.153633e-01
## District 9 0.0975744805 1.210740e-01
## An Education 0.6324263704 1.332392e-02
## The Hurt Locker 0.4061061887 5.955359e-02
## Inglourious Basterds 0.1568324557 3.345988e-01
## Precious: Based on the Novel 'Push' by Sapphire 0.0174193322 3.786810e-01
## A Serious Man 0.6324263704 1.332392e-02
## Up 0.6324263704 1.332392e-02
## Up in the Air 0.1865210265 8.146072e-02
## Black Swan 0.1211620717 9.493257e-06
## The Fighter 0.0003690130 5.139458e-01
## Inception 0.0004343988 1.383076e-01
## The Kids Are All Right 0.6324263704 1.332392e-02
## The King's Speech 0.5205372305 3.948111e-02
## 127 Hours 0.0975744805 1.210740e-01
## The Social Network 0.1371317088 7.064282e-03
## Toy Story 3 0.6324263704 1.332392e-02
## True Grit 0.0025251344 5.189464e-02
## Winter's Bone 0.6324263704 1.332392e-02
## The Artist 0.5035538379 3.028277e-02
## The Descendants 0.0174193322 3.786810e-01
## Extremely Loud & Incredibly Close 0.2891900021 4.153633e-01
## The Help 0.2891900021 4.153633e-01
## Hugo 0.2090417116 1.310680e-02
## Midnight in Paris 0.0428515726 7.820352e-02
## Moneyball 0.0975744805 1.210740e-01
## The Tree of Life 0.0046167227 8.887422e-02
## War Horse 0.0218946344 3.106052e-01
## Amour 0.1865210265 8.146072e-02
## Argo 0.0009039724 8.920859e-02
## Beasts of the Southern Wild 0.1865210265 8.146072e-02
## Django Unchained 0.0058091073 2.837944e-04
## Les Miserables 0.2891900021 4.153633e-01
## Life of Pi 0.5286438743 9.822107e-03
## Lincoln 0.1568324557 3.345988e-01
## Silver Linings Playbook 0.0003690130 5.139458e-01
## Zero Dark Thirty 0.0975744805 1.210740e-01
## American Hustle 0.0003690130 5.139458e-01
## Captain Phillips 0.0975744805 1.210740e-01
## Dallas Buyers Club 0.0975744805 1.210740e-01
## Gravity 0.4009246716 5.052506e-01
## Her 0.2222612387 4.701495e-04
## Nebraska 0.0025251344 5.189464e-02
## Philomena 0.6324263704 1.332392e-02
## 12 Years a Slave 0.0174193322 3.786810e-01
## The Wolf of Wall Street 0.1865210265 8.146072e-02
## American Sniper 0.0975744805 1.210740e-01
## Birdman or (The Unexpected Virtue of Ignorance) 0.3152206556 6.418699e-02
## Boyhood 0.0003690130 5.139458e-01
## The Grand Budapest Hotel 0.1568324557 3.345988e-01
## The Imitation Game 0.0174193322 3.786810e-01
## Selma 0.2891900021 4.153633e-01
## The Theory of Everything 0.6324263704 1.332392e-02
## Whiplash 0.0157958513 1.461555e-01
## The Big Short 0.0174193322 3.786810e-01
## Bridge of Spies 0.6324263704 1.332392e-02
## Brooklyn 0.6324263704 1.332392e-02
## Mad Max: Fury Road 0.0962125874 2.289215e-01
## The Martian 0.6324263704 1.332392e-02
## The Revenant 0.4738973318 1.623892e-01
## Room 0.1865210265 8.146072e-02
## Spotlight 0.0174193322 3.786810e-01
## Arrival 0.1568324557 3.345988e-01
## Fences 0.6324263704 1.332392e-02
## Hacksaw Ridge 0.0068681011 2.652734e-01
## Hell or High Water 0.0975744805 1.210740e-01
## Hidden Figures 0.6324263704 1.332392e-02
## La La Land 0.5286438743 9.822107e-03
## Lion 0.0488234219 7.739171e-03
## Manchester by the Sea 0.0428515726 7.820352e-02
## Moonlight 0.1866741937 2.873509e-01
## Call Me by Your Name 0.2222612387 4.701495e-04
## Darkest Hour 0.0218946344 3.106052e-01
## Dunkirk 0.0962125874 2.289215e-01
## Get Out 0.0428515726 7.820352e-02
## Lady Bird 0.1865210265 8.146072e-02
## Phantom Thread 0.0799943330 1.184512e-01
## The Post 0.2891900021 4.153633e-01
## The Shape of Water 0.5035538379 3.028277e-02
## Three Billboards outside Ebbing, Missouri 0.0975744805 1.210740e-01
## Black Panther 0.2891900021 4.153633e-01
## BlacKkKlansman 0.0174193322 3.786810e-01
## Bohemian Rhapsody 0.0077706989 5.205568e-01
## The Favourite 0.1568324557 3.345988e-01
## Green Book 0.0141629138 1.077117e-01
## Roma 0.2815961296 9.567136e-02
## A Star Is Born 0.0488234219 7.739171e-03
## Vice 0.0003690130 5.139458e-01
## Ford v Ferrari 0.0077706989 5.205568e-01
## The Irishman 0.1568324557 3.345988e-01
## Jojo Rabbit 0.0141629138 1.077117e-01
## Joker 0.1568324557 3.345988e-01
## Little Women 0.6324263704 1.332392e-02
## Marriage Story 0.6324263704 1.332392e-02
## 1917 0.0441474907 2.102895e-02
## Once upon a Time...in Hollywood 0.0025251344 5.189464e-02
## Parasite 0.2685401891 4.544268e-02
## The Father 0.0141629138 1.077117e-01
## Judas and the Black Messiah 0.0488234219 7.739171e-03
## Mank 0.0430107950 2.547083e-01
## Minari 0.1865210265 8.146072e-02
## Nomadland 0.5035538379 3.028277e-02
## Promising Young Woman 0.0174193322 3.786810e-01
## Sound of Metal 0.0157958513 1.461555e-01
## The Trial of the Chicago 7 0.0194954517 7.772997e-02
## Belfast 0.0428515726 7.820352e-02
## CODA 0.2222612387 4.701495e-04
## Don't Look Up 0.0975744805 1.210740e-01
## Drive My Car 0.1865210265 8.146072e-02
## Dune 0.0675271205 2.813619e-01
## King Richard 0.0975744805 1.210740e-01
## Licorice Pizza 0.1865210265 8.146072e-02
## Nightmare Alley 0.0218946344 3.106052e-01
## The Power of the Dog 0.5035538379 3.028277e-02
## West Side Story 0.0046167227 8.887422e-02
## All Quiet on the Western Front 0.0004343988 1.383076e-01
## Avatar: The Way of Water 0.2891900021 4.153633e-01
## The Banshees of Inisherin 0.0003690130 5.139458e-01
## Elvis 0.0195617540 6.425135e-02
## Everything Everywhere All at Once 0.2120081487 6.656816e-02
## The Fabelmans 0.1865210265 8.146072e-02
## Tar 0.1568324557 3.345988e-01
## Top Gun: Maverick 0.0975744805 1.210740e-01
## Triangle of Sadness 0.1865210265 8.146072e-02
## Women Talking 0.2222612387 4.701495e-04
##################################
# Extracting the contributions
# (in percentage) of the individuals
# to the principal components
##################################
DR_MCA_IND$contrib[,c("Dim 1","Dim 2")]
## Dim 1 Dim 2
## Avatar 1.399033677 6.479086e-01
## The Blind Side 0.974229942 1.612996e+00
## District 9 0.186134766 2.662373e-01
## An Education 1.094682550 2.658503e-02
## The Hurt Locker 4.618526525 7.807261e-01
## Inglourious Basterds 0.491791730 1.209474e+00
## Precious: Based on the Novel 'Push' by Sapphire 0.059192909 1.483333e+00
## A Serious Man 1.094682550 2.658503e-02
## Up 1.094682550 2.658503e-02
## Up in the Air 0.345141303 1.737578e-01
## Black Swan 0.578388464 5.223903e-05
## The Fighter 0.000748029 1.200939e+00
## Inception 0.002494676 9.155846e-01
## The Kids Are All Right 1.094682550 2.658503e-02
## The King's Speech 4.049759486 3.540738e-01
## 127 Hours 0.186134766 2.662373e-01
## The Social Network 1.110702608 6.595609e-02
## Toy Story 3 1.094682550 2.658503e-02
## True Grit 0.007472088 1.770137e-01
## Winter's Bone 1.094682550 2.658503e-02
## The Artist 3.227252762 2.237225e-01
## The Descendants 0.059192909 1.483333e+00
## Extremely Loud & Incredibly Close 0.974229942 1.612996e+00
## The Help 0.974229942 1.612996e+00
## Hugo 1.262404218 9.124084e-02
## Midnight in Paris 0.138043062 2.904030e-01
## Moneyball 0.186134766 2.662373e-01
## The Tree of Life 0.021222998 4.709510e-01
## War Horse 0.098033082 1.603137e+00
## Amour 0.345141303 1.737578e-01
## Argo 0.006211530 7.066055e-01
## Beasts of the Southern Wild 0.345141303 1.737578e-01
## Django Unchained 0.024459849 1.377448e-03
## Les Miserables 0.974229942 1.612996e+00
## Life of Pi 4.922827606 1.054346e-01
## Lincoln 0.491791730 1.209474e+00
## Silver Linings Playbook 0.000748029 1.200939e+00
## Zero Dark Thirty 0.186134766 2.662373e-01
## American Hustle 0.000748029 1.200939e+00
## Captain Phillips 0.186134766 2.662373e-01
## Dallas Buyers Club 0.186134766 2.662373e-01
## Gravity 5.830577249 8.469984e+00
## Her 0.689439557 1.681107e-03
## Nebraska 0.007472088 1.770137e-01
## Philomena 1.094682550 2.658503e-02
## 12 Years a Slave 0.059192909 1.483333e+00
## The Wolf of Wall Street 0.345141303 1.737578e-01
## American Sniper 0.186134766 2.662373e-01
## Birdman or (The Unexpected Virtue of Ignorance) 3.311865001 7.773774e-01
## Boyhood 0.000748029 1.200939e+00
## The Grand Budapest Hotel 0.491791730 1.209474e+00
## The Imitation Game 0.059192909 1.483333e+00
## Selma 0.974229942 1.612996e+00
## The Theory of Everything 1.094682550 2.658503e-02
## Whiplash 0.086882910 9.266876e-01
## The Big Short 0.059192909 1.483333e+00
## Bridge of Spies 1.094682550 2.658503e-02
## Brooklyn 1.094682550 2.658503e-02
## Mad Max: Fury Road 0.804955036 2.207767e+00
## The Martian 1.094682550 2.658503e-02
## The Revenant 5.189213436 2.049752e+00
## Room 0.345141303 1.737578e-01
## Spotlight 0.059192909 1.483333e+00
## Arrival 0.491791730 1.209474e+00
## Fences 1.094682550 2.658503e-02
## Hacksaw Ridge 0.049846967 2.219334e+00
## Hell or High Water 0.186134766 2.662373e-01
## Hidden Figures 1.094682550 2.658503e-02
## La La Land 4.922827606 1.054346e-01
## Lion 0.138638759 2.533249e-02
## Manchester by the Sea 0.138043062 2.904030e-01
## Moonlight 0.841300693 1.492817e+00
## Call Me by Your Name 0.689439557 1.681107e-03
## Darkest Hour 0.098033082 1.603137e+00
## Dunkirk 0.804955036 2.207767e+00
## Get Out 0.138043062 2.904030e-01
## Lady Bird 0.345141303 1.737578e-01
## Phantom Thread 0.279045282 4.763015e-01
## The Post 0.974229942 1.612996e+00
## The Shape of Water 3.227252762 2.237225e-01
## Three Billboards outside Ebbing, Missouri 0.186134766 2.662373e-01
## Black Panther 0.974229942 1.612996e+00
## BlacKkKlansman 0.059192909 1.483333e+00
## Bohemian Rhapsody 0.055469275 4.283388e+00
## The Favourite 0.491791730 1.209474e+00
## Green Book 0.046434894 4.070819e-01
## Roma 2.572518321 1.007490e+00
## A Star Is Born 0.138638759 2.533249e-02
## Vice 0.000748029 1.200939e+00
## Ford v Ferrari 0.055469275 4.283388e+00
## The Irishman 0.491791730 1.209474e+00
## Jojo Rabbit 0.046434894 4.070819e-01
## Joker 0.491791730 1.209474e+00
## Little Women 1.094682550 2.658503e-02
## Marriage Story 1.094682550 2.658503e-02
## 1917 0.258806450 1.421064e-01
## Once upon a Time...in Hollywood 0.007472088 1.770137e-01
## Parasite 1.791509708 3.494627e-01
## The Father 0.046434894 4.070819e-01
## Judas and the Black Messiah 0.138638759 2.533249e-02
## Mank 0.322590084 2.202134e+00
## Minari 0.345141303 1.737578e-01
## Nomadland 3.227252762 2.237225e-01
## Promising Young Woman 0.059192909 1.483333e+00
## Sound of Metal 0.086882910 9.266876e-01
## The Trial of the Chicago 7 0.058803891 2.702640e-01
## Belfast 0.138043062 2.904030e-01
## CODA 0.689439557 1.681107e-03
## Don't Look Up 0.186134766 2.662373e-01
## Drive My Car 0.345141303 1.737578e-01
## Dune 0.642335633 3.085148e+00
## King Richard 0.186134766 2.662373e-01
## Licorice Pizza 0.345141303 1.737578e-01
## Nightmare Alley 0.098033082 1.603137e+00
## The Power of the Dog 3.227252762 2.237225e-01
## West Side Story 0.021222998 4.709510e-01
## All Quiet on the Western Front 0.002494676 9.155846e-01
## Avatar: The Way of Water 0.974229942 1.612996e+00
## The Banshees of Inisherin 0.000748029 1.200939e+00
## Elvis 0.091044038 3.447093e-01
## Everything Everywhere All at Once 2.176059309 7.876108e-01
## The Fabelmans 0.345141303 1.737578e-01
## Tar 0.491791730 1.209474e+00
## Top Gun: Maverick 0.186134766 2.662373e-01
## Triangle of Sadness 0.345141303 1.737578e-01
## Women Talking 0.689439557 1.681107e-03
##################################
# Extracting the correlation
# between the individual instances
# grouped by Picture categories
# and top principal components
##################################
(DR_MCA_IndividualCorrelationCircleByPicture <- fviz_mca_ind(DR_MCA,
geom.ind = "text",
col.ind = SD_MCA$Picture,
palette = c("#888888","#5544FF"),
repel = TRUE,
legend.title = "Picture",
addEllipses = FALSE) +
labs(title = "Multiple Correspondence Analysis : Correlation Circle",
subtitle = "Principal Components Versus Individuals Grouped by Picture Categories",
y = "Principal Component 2",
x = "Principal Component 1") +
theme_classic() +
theme(legend.position="top"))

##################################
# Extracting the correlation
# between the individual instances
# grouped by Year categories
# and top principal components
##################################
(DR_MCA_IndividualCorrelationCircleByYear <- fviz_mca_ind(DR_MCA,
geom.ind = "text",
col.ind = SD_MCA$Year,
repel = TRUE,
legend.title = "Year",
addEllipses = FALSE) +
labs(title = "Multiple Correspondence Analysis : Correlation Circle",
subtitle = "Principal Components Versus Individuals Grouped by Picture Categories",
y = "Principal Component 2",
x = "Principal Component 1") +
theme_classic() +
theme(legend.position="top"))

##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 1 and Top 2 principal components
##################################
(DR_MCA_PCIndividualContributors <- fviz_contrib(DR_MCA,
choice = "ind",
axes = 1:2,
top = 10) +
labs(title = "Multiple Correspondence Analysis : Individual Contribution",
subtitle = "Principal Components 1 and 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 1 principal component
##################################
(DR_MCA_PC1IndividualContributors <- fviz_contrib(DR_MCA,
choice = "ind",
axes = 1,
top = 10) +
labs(title = "Multiple Correspondence Analysis : Individual Contribution",
subtitle = "Principal Component 1 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Extracting the contribution
# of the descriptor variables
# for the Top 2 principal component
##################################
(DR_MCA_PC2IndividualContributors <- fviz_contrib(DR_MCA,
choice = "ind",
axes = 2,
top = 10) +
labs(title = "Multiple Correspondence Analysis : Individual Contribution",
subtitle = "Principal Component 2 Contributors",
x = "Descriptors") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)))

##################################
# Formulating the Biplot
# using the individual scores
# and descriptor variable loadings
# for the Top 1 and Top 2 principal components
##################################
(DR_MCA_Biplot <- fviz_mca_biplot(DR_MCA,
label = c("var","ind"),
repel = TRUE,
habillage = SD_MCA$Picture,
palette = c("#888888","#5544FF"),
col.var = "#FF5050",
legend.title = "Picture",
addEllipses = FALSE) +
labs(title = "Multiple Correspondence Analysis : Biplot",
subtitle = "Individual Scores versus Descriptor Variable Loadings",
x = "Principal Component 1",
y = "Principal Component 2") +
theme_classic() +
theme(legend.position="top"))
